arXiv: 1502.04475v2 [hep-ph] 10 Dec 2015 


Version 2.3 


Optimal modeling of ID azimuth correlations in the context of Bayesian inference 

Michiel B. De Kock, 1 Hans C. Eggers, 1 and Thomas A. Trainor 2 

1 Stellenbosch University and National Institute for Theoretical Physics (NITheP), ZA-7600 Stellenbosch, South Africa 
2 CENPA 35f290, University of Washington, Seattle, Washington 98195, United States 

(Dated: Version September 2015) 

Analysis and interpretation of spectrum and correlation data from high-energy nuclear collisions 
is currently controversial because two opposing physics narratives derive contradictory implications 
from the same data—one narrative claiming collision dynamics is dominated by dijet production 
and projectile-nucleon fragmentation, the other claiming collision dynamics is dominated by a dense, 
flowing QCD medium. Opposing interpretations seem to be supported by alternative data models, 
and current model-comparison schemes are unable to distinguish between them. There is clearly 
need for a convincing new methodology to break the deadlock. In this study we introduce Bayesian 
Inference (BI) methods applied to angular correlation data as a basis to evaluate competing data 
models. For simplicity the data considered are projections of 2D angular correlations onto ID az¬ 
imuth from three centrality classes of 200 GeV Au-Au collisions. We consider several data models 
typical of current model choices, including Fourier series (FS) and a Gaussian plus various combina¬ 
tions of individual cosine components. We evaluate model performance with BI methods and with 
power-spectrum (PS) analysis. We find that the FS-only model is rejected in all cases by Bayesian 
analysis which always prefers a Gaussian. A cylindrical quadrupole cos(2 <j>) is required in some cases 
but rejected for 0-5%-central Au-Au collisions. Given a Gaussian centered at the azimuth origin 
“higher harmonics” cos (mf) for m > 2 are rejected. A model consisting of Gaussian + dipole cos(^>) 

+ quadrupole cos(2 (j>) provides good ID data descriptions in all cases. 

PACS numbers: 25.75.-q, 25.75.Gz, 25.75.Nq, 25.75.Ld, 25.75.Bh 


I. INTRODUCTION 

A significant and persistent problem has emerged con¬ 
cerning models for high-energy nucleus-nucleus (A-A) 
collision data from the relativistic heavy ion collider 
(RHIC) and the large hadron collider (LHC). Distinct 
classes of data models with divergent physics implica¬ 
tions are invoked to support two narratives: a high- 
energy physics (HEP)/jets narrative in which the essen¬ 
tial phenomenon is dijet production m and a quark- 
gluon plasma (QGP)/flow narrative in which the essen¬ 
tial phenomenon is a flowing dense QCD medium or QGP 
and dijets play no significant role mm- 

The HEP/jets narrative emerges spontaneously from 
an analysis program based on spectrum and correlation 
data models derived from the observed differential struc¬ 
ture of available data ISHE3I- In contrast, models emerg¬ 
ing from the QGP/flow narrative tend to rely on the¬ 
oretical motivations coupled with data and information 
selection (e.g. p t cuts, preferred A-A centralities, ratio 
measures) menu- a comparison of RHIC results and 
interpretations is presented in Ref. [20. . 

For example, 2D angular correlations from high-energy 
nuclear collisions include only a few structures common 
to all collisions from p-p to central Au-Au at RHIC en¬ 
ergies. A simple mathematical model of those struc¬ 
tures describes almost all data accurately with no sig¬ 
nificant residual structure [a El H2 Eg. No theoret¬ 
ical assumptions motivated the data model. Three of 
the four principal model elements have been interpreted 
post facto as representing dijet production and projectile- 
nucleon dissociation f2TI - ElT| . Interpretation of the fourth 


element, an independent azimuth quadrupole, remains 
in question [24H301 . Differential analysis of hadron p t 
spectra reveals two components modeled by simple func¬ 
tions cm nu. One component is identified with frag¬ 
ments from dijets described quantitatively by QCD cal¬ 
culations [3T] ■ Most spectrum and correlation structures 
appear to be consistent with the HEP/jets narrative. 
Alternative models motivated by the QGP/flow narra¬ 
tive include quantity V 2 [Fourier coefficient of function 
cos(2 4>) fitted to ID projections of 2D angular correla¬ 
tions] interpreted to represent elliptic flow [3J |TS], a 
blast-wave spectrum model interpreted to measure ra¬ 
dial flow [lb], spectrum ratio Raa interpreted to indi¬ 
cate jet quenching within a dense QCD medium nzi. 
and dihadron correlation analysis via background sub¬ 
traction interpreted to represent jet structure DSHS1S2I- 
“Higher harmonic” flows have been inferred recently from 
azimuth distributions via Fourier-series models |33H35j . 

The same underlying particle data are therefore char¬ 
acterized and interpreted with competing mathematical 
models applied to different data selections, variables and 
measured quantities. Judgments on the validity and rel¬ 
ative merits of competing data models have relied histor¬ 
ically on comparisons of minimum- \ 2 values and qual¬ 
itative arguments based on consistency of a given nar¬ 
rative across selected measured quantities. While such 
an approach might suffice when the underlying physical 
processes and models are simple, the complexity of A-A 
phenomenology and lack of consistent quantitative crite¬ 
ria have impeded progress in resolving conflicts. 

To address this problem we require a formal context in 
which competing data models are evaluated on a statis- 
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tically sound basis, and a “best” model may be selected 
that either does not rely on unspoken a priori physics as¬ 
sumptions or renders such assumptions quantifiable. We 
suggest that this context exists in the form of Bayesian 
Inference (BI) which provides both a formal mathemat¬ 
ical framework and the necessary concepts to represent 
prior knowledge, evaluate candidate models for different 
parameter values and thereby establish value judgments 
on models as a whole ETf)l - flTi7l . Each data model is rated 
not only by how well it describes some data or how much 
data it describes well, but also by the “cost” of the model 
in terms of complexity and parameter number (Occam 
penalty) and associated physical assumptions. 

In this study we focus on ID projections onto azimuth 
<f> of 2D angular correlations reported in Ref. [9j, cur¬ 
rently one of the most contentious areas of RHIC/LHC 
data analysis. We consider several popular data models 
and evaluate them according to BI methods to determine 
whether a uniquely preferred data model can be estab¬ 
lished without recourse to a priori physics assumptions. 

This article is arranged as follows: Secti on |TT| presents 
the basics of Bayesian Inference. Section |III| describes 
Fourier power spectra (PS) and their properties. Sec¬ 
tion |IV| summarizes analysis methods applied to corre¬ 
lation data. Section El introduces the correlation data 
used for this study. Sections VI VII and VIII| apply 
BI and PS methods to azimuth projections from three 
centralities of 200 GeV Au-Au collisions. Section m 
presents systematic-uncertainty estimates. Sections [X] 
and XI present discussion and summary. Appendices A 
and B consider the geometry of BI analysis and periodic 
peak arrays respectively. 


II. BAYESIAN INFERENCE 

Bayesian Inference addresses the problem of relating 
parametrized model functions to available data in an op¬ 
timal manner. Given specific data values the best set 
of parameter values for each model is determined based 
on the likelihood function. Several models are then com¬ 
pared based on each model’s evidence, an integral mea¬ 
sure defined below. The most plausible and therefore 
preferred data model produces the largest evidence value. 


A. The x measure and model fits to data 


In the present study we focus on aspects of Bayesian 
Inference that correspond directly with the methodology 
of x 2 minimization. Given a set of N data points ( x n , y n ) 
with experimentally determined standard errors cr n on y n 
the conventional y 2 statistic evaluating the goodness of 
fit of model function f(x \ uj) with K parameters Wk is 


X 
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Vn ~ 


f(Xn | W) 
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(i) 


As stated in Ref. -30] the y 2 measure assumes a Gaus¬ 
sian distribution of data-sample fluctuations about mean 
values which we accept as a reasonable approximation 
for ID RHIC/LHC data projections. In what follows 
model functions are represented by D(w), a vector func¬ 
tion mapping parameter space w to data space D. 

Most model comparisons are based on y 2 /DoF, where 
the number of fit degrees of freedom (DoF) is assumed to 
be the number of data points N minus the number of free 
model parameters K. Minimizing y 2 without considering 
the fit DoF is clearly misleading since there are infinitely 
many models with K=N free parameters that might de¬ 
scribe the same N data points with y 2 = o HU. We re¬ 
quire a mechanism to penalize excess model parameters 
such that a simple few-parameter model that describes 
the data well may be favored over more-complex models. 
That mechanism exists in the form of Bayesian Inference. 


B. Logical and rational inference 

Distinction may be drawn between logical inference on 
the one hand, in which nominally-valid conclusions are 
drawn via a logical chain of argument from premises as¬ 
sumed to be true and rational inference on the other, 
in which patterns or events (i.e. data) are used to im¬ 
prove our understanding of the physical system, either 
augmenting or displacing previous understanding. Both 
the acquired data and the modified understanding may 
be uncertain to some degree as measured by probabil¬ 
ities. Rational inference includes induction, in which 
newly-acquired data are employed to formulate or refine 
a model, and deduction in which a fixed model is used to 
predict values of data not yet acquired [35] . 

Bayesian Inference is a formal recipe for rational in¬ 
ference based on Bayes’ theorem SHIS]. “Understand¬ 
ing” in this context means that reality in the form of 
data or data-derived quantities is well described by a 
parametrized model. A given set of parameter values 
predicts a specific set of possible data values. Previous 
understanding including uncertainties is represented by 
the prior , a probability distribution function (PDF) on 
possible parameter values. As new data are acquired BI 
provides a means to update the PDF on model parame¬ 
ters to effect improved understanding in the form of the 
posterior PDF, thereby refining the model by reducing 
the volume of its parameter space or falsifying the model 
altogether if the new data fall outside the model’s pre¬ 
dicted data volume. 


C. The probability chain rule and Bayes’ theorem 

Bayesian Inference is based on relations among joint, 
conditional and marginal PDFs and related unnormal¬ 
ized functions distributed on data and model-parameter 
spaces [Uj- External factors common to all models that 
may influence the inference process are represented by a 
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comprehensive parameter set Q suppressed below. Our 
notation follows that in Refs. [39] and [43l . 

A model H is defined by a joint PDF p(wD\HQ) —>■ 
p(wD\H), where w and D are multidimensional spaces 
representing model-parameter values and data values. 
The corresponding conditional PDFs are p(w\DH) and 
p(D\wH), and the marginal PDFs are p[w\H) and 
p(D\H). The probability chain rule provides factor¬ 
izations in the form p(wD\H) = p(w\DH)p(D\H) = 
p(D\wH)p(w\H). Bayes’ theorem (BT) can then be ex¬ 
pressed in either of two forms 


p{w\DH) 
p(D\wH ) 


p(D\wH)p(w\H) 

p(D\H) 

p{w\DH)p{D\H) 
p(w\H) 


( 2 ) 


both of which are valid descriptions of a joint PDF. 
However, only the first line is applicable to BI analy¬ 
sis that proceeds from specific data values to improved 
parametrized data model, a unique BT application. 


D. Prior and posterior PDFs — model fits 


As applied to BI analysis some quantities in the first 
line of Eq. © must be defined more specifically. In this 
application quantity D is not a variable on the space of 
all possible data; it is a specific set of data values D* 
with uncertainties or errors up. Factor p(D\wH), a nor¬ 
malized conditional PDF on data space D , is redefined as 
the likelihood function L(D*\wH) on parameter space w 
for model H given specific data D* and model function 
D(w). p(w\H ) is the prior PDF on model parameters w 
determined before data D* are available. p(w\D*H) is 
the posterior PDF on parameters w given the new data. 
Denominator p(D\H), also a PDF on space D , is rede¬ 
fined as the evidence (a number) for model H given spe¬ 
cific data D* which we denote by the symbol E(D*\H). 
With those more-specific definitions the version of Bayes’ 
Theorem used for BI is 


p(w\d*h) = \ H )> ( 3 ) 

which can be read as “A posterior PDF on w is derived 
from a prior PDF given data D*, likelihood L and evi¬ 
dence E.” Any change between prior and posterior repre¬ 
sents information acquired by the model from the data. 
The result is an updated PDF on model parameters de¬ 
termined by newly-acquired specific data values D*. The 
posterior PDF on parameters w provides considerably 
more information about the model than the best-fit pa¬ 
rameter set w and uncertainties a w derived from conven¬ 
tional x 2 model fits to data. 


E. Model comparisons and evidence 


Beyond determining posterior PDFs on parameters w 
Bayes’ Theorem can be used on a higher level for com¬ 
parisons among competing data models in the form 


p(H\D*) 


E(D*\H)p(H) 

p(D*) 


(4) 


where p(H\D*) is the plausibility of model H given data 
values D* and p{H) is the prior model probability within 
some assumed context represented by Q (suppressed). 
The main goal of this study is comparison of competing 
model functions D(w ) with all other BI elements main¬ 
tained as similar as possible. 

Evidence E is just a normalization parameter in 
Eq. but its absolute numerical value is important 
for model comparisons. Because the likelihood is usually 
a peaked function on w with single mode near some opti¬ 
mal parameter values w the evidence defined in the first 
line below can be represented by Laplace’s approximation 
in the second line [35] 


E{D*\H) = J dwL(D*\wH)p(w\H) (5) 

~ L(D*\wH)^ (2 n) K det Ck p(w\H), 

where L(D*\wH) is the maximum likelihood and 
Ck(D*\wH) is the covariance matrix for model function 
D(w) with K parameters. The negative log evidence is 

-2LE « X 2 (D*\wH) + 2I(D*\H) +constant (6) 

with usual y 2 parameter, and information / is defined by 


I(D*\H) = -In 


\J ( 2ir) K det Ck p(w\H) 


(7) 


the information gained by model H from specific data 
D*. Information is the log of a volume ratio as discussed 
in the next subsection. In general x decreases and I 
increases as parameter-number index K increases. The 
sum —2LE should then have a minimum corresponding 
to the maximum evidence for a specific model. For an 
optimized predictive model (e.g. a theory) / ss 0 and y 2 ~ 
fit DoF (= data DoF N minus model DoF K). 

Quick and easy comparisons between two models Hi 
and H 2 can be obtained by calculating the evidence ra¬ 
tio E(D*\Hi)/E(D*\H 2 ), also known as an odds ratio. 
Assuming equal model priors p(Hi) = p(H 2 ) the Bayes 
Factor is [3H1 IS] 


B 12 


p(Hi\D*) _ 1 E{D*\Hi) 
p(H 2 \D*) E(D*\H 2 )' 


( 8 ) 


Comparisons among more than two models indexed by l 
are effected by 


p(Hi\D*) 




(9) 
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where E(D*\Hi)p(Hi) replaces p(D*) in Eq. Q. The 
model priors p(Hi ) could be set equal assuming igno¬ 
rance, but in practice assigned model priors may dif¬ 
fer sharply among competing models, possibly reflecting 
strong prejudices. 

Our use of differences in log Evidence (Bayes factors) 
rather than isolated values is consistent with the use of 
Likelihood ratios (e.g. Neyman-Pearson approach). Ev¬ 
idence ratios are an improvement on Likelihood ratios 
because the latter assume delta-function priors. 


F. Bayesian priors and Information 


Information is generally defined as the logarithm of a 
volume ratio, the volumes being subsets of some space of 
alternatives before and after a message (data) is received 
conveying information. For instance, if a message reduces 
the number of possible alternatives by factor 2 then the 
amount of information received is log 2 (Fi/V 2 = 2) = 1: 
one “bit” of information is provided by the message. Sev¬ 
eral definitions of information have been formulated (e.g. 
Shannon, Renyi), and the precise correspondence to a 
volume ratio varies from case to case. In some cases 
the terms “information” and “entropy” may be used in¬ 
terchangeably such that for example “information gain” 
may represent the difference between two entropies. 

In Eq. 0 factor p(w\H) is related to the prior volume 
V W (H) of a model parameter space and a/ (2tt) k det Ck 
approximates the posterior volume V W (D*\H). Thus, 
information I(D*\H) is defined here as the natural log 
of the prior volume over the posterior volume. A 
prior PDF based on ignorance (uniform or translation- 
invariant probability within some assumed boundaries for 
each parameter) is estimated by the product 


p(w\ h ) « n i : 

7 1 

k= 1 


1 

vjjfy 


( 10 ) 


where the estimated A& for amplitude parameters may 
be based on differences of data extreme values, but the 
prior for angle parameters depends on circumstances. In 
this study the condition <70 A G [0,7r/2] is based on the 
definition of the same-side peak at the azimuth origin. 

Since typical correlation-structure amplitudes (e.g. 
peak-to-peak excursions) are generally < 0(1) and given 
the assumed constraint on the Gaussian width we assign 
Afc = 1 for those cases. Given certain algebraic relations 
it is reasonable to assume that cosine coefficients and un¬ 
certainties may be substantially smaller on average than 
the Gaussian amplitude and width. For all cosine com¬ 
ponents in any model we assign Afc = 1/3. Given those 
assignments the basic Model (defined below) is somewhat 
disadvantaged (smaller prior probability) compared to 
models based only on cosine terms. Further discussion 
of prior construction is found in Ref. (40j. 

The posterior volume is obtained from the determinant 
of the covariance matrix det Ck which, in the absence of 


significant covariances, is the product of the variances 
for the several model parameters. Its square root is then 
the product of r.m.s. widths on parameters, the posterior 
volume. In this study the Hessian (matrix of second- 
order derivatives at maximum of the Likelihood function 
derived from data D*) is obtained, and the covariance 
matrix is constructed from the Hessian elements. 

The information defined in Eq. ([T]) permits a quan¬ 
titative expression of Occam’s razor in two ways: (a) 
For a model with a large prior volume in parameter 
space (representing many “causes”, some possibly unnec¬ 
essary) a substantial reduction in the parameter volume 
on encountering data D* automatically incurs an Occam 
penalty by means of larger I. (b) The ^-dependence of 
I implies that while models with more parameters may 
have a smaller y 2 and larger likelihood, the extra pa¬ 
rameters are also penalized by increased / resulting in 
reduced overall model plausibility. 


III. FOURIER POWER SPECTRUM 

The Fourier power spectrum (PS) is an alternative in¬ 
formation measure well understood in the context of sig¬ 
nal processing. Comparison of PS results with BI analy¬ 
sis may better convey the technical details and interpre¬ 
tations of the latter. 

The Wiener-Klrinchin theorem j47j states that the 
Fourier transform of a two-particle autocorrelation is the 
corresponding power spectrum of an underlying single¬ 
particle distribution. Data autocorrelations A(cf> n ) with 
N elements are periodic, symmetrized about 0 and ir and 
described by a PS with m £ [0, N — 1] and P m = P/v_ m . 
The PS expansion of autocorrelation data 


N -1 

A((t>n) = Pm COS (mcfln) (H) 

m =0 

N/2 

= P 0 +^2 P m 2cos(m0 n ) 

m= 1 

might be viewed as a model function from which the 
power-spectrum elements P m could be determined by 
model fitting. However, in this study the PS elements 
are obtained directly by integrating the data 

1 f 2n 

Pm = 2 n d ( / >A cos( 7 ti(/a)A((/>a) (12) 

l N ~ 1 

-t — cos(mc/v)A(<(v). 

n '—0 

Note that A(0) is the “total power” 5Zm=o (with 
N/2 + 1 independent elements), and Pq is the mean value 
of the ID autocorrelation (which, for data histograms in¬ 
troduced below and used in this study, is set to zero). 

The power spectrum for a sample sequence may con¬ 
tain a deterministic “signal” component and a random 
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(white) noise component. The signal may be localized at 
smaller wave number (index m), while an approximately 
flat white-noise spectrum is revealed at larger index val¬ 
ues if the sample rate or bin number (resolution) is large 
enough (see Nyquist frequency limit below). The white- 
noise amplitude should correspond to the estimated sta¬ 
tistical (Poisson) error used in \ 2 fits to a sample se¬ 
quence and to the r.m.s. error inferred from fit residuals. 

The Nyquist limit applied to periodic azimuth implies 
that the power spectrum must be symmetric about the 
bin on m containing N/2. For N = 24 there are then 

12 + 1 independent PS elements (including m = 0) and 

13 unique autocorrelation data bins whose contents may 
be correlated by one or more parent processes. For the 
broader correlation structures considered here the bin 
number, and therefore the Nyquist limit, is adequate. 
For the narrower BE/electron peak (defined below) the 
bin number (hence angle resolution) is insufficient, but 
that structure is not important for this analysis. 

Power spectra PS should be distinguished from Fourier 
series (FS-only) fit models. A PS consisting of elements 
P m evaluated for all index values m £ [0, N/2] completely 
characterizes a data autocorrelation. FS-only models 
have a varying number of elements indexed by k < K 1 
K € [1, IV/2] being the number of parameters for a model. 


IV. ANALYSIS METHODS 


angular correlation structure may be approximately in¬ 
variant along a sum axis x-s = x\ + X 2 (stationarity). In 
that case averages along for each value of the corre¬ 
sponding difference variable xa = %i — %2 comprise an 
autocorrelation A(x a). Angular correlations on ( 77 , </>) are 
then measured as 2D densities A (77 a, 4>a) without signif¬ 
icant loss of information [55] . 

B. A-A centrality measures 

A-A collision centrality is measured by comparing 
a measured minimum-bias (MB) event distribution on 
charge multiplicity n c h within some fiducial angular ac¬ 
ceptance with a Glauber Monte Carlo model of A-A col¬ 
lisions producing MB distributions on nucleon partic¬ 
ipant number N part and N-N binary-collision number 
Nb in )49j . The intermediary is the A-A fractional cross 
section ct/cto. For the data employed in this study cen¬ 
trality is designated by fractional cross section in percent, 
where 100 % refers to extreme peripheral collisions and 
0% refers to head-on collisions. For the data employed 
in this study collision events were sorted into eleven cen¬ 
trality bins: ten equal 10 % centrality bins with the most- 
central 10% bin split into two 5% bins. The bins are num¬ 
bered 0 (most peripheral) through 10 (most central). The 
three (corrected) centrality intervals used in this study 
are 0-5% (bin 10), 9-18% (bin 8 ) and 83-94% (bin 0). 


High-energy nuclear collisions at the RHIC and LHC 
produce hadrons in each collision ranging in number from 
a few to thousands (depending on collision centrality) via 
several physical mechanisms. By studying properties of 
hadron yields, spectra and correlations we seek to identify 
and characterize the various underlying mechanisms. In 
this study we apply BI methods to evaluate several math¬ 
ematical models of 2D angular correlations projected to 
ID azimuth. In this section we summarizes basic analysis 
methods that produce the angular correlation data and 
our strategy for BI evaluation of the data models. 


A. Kinematic variables and spaces 

High-energy nuclear collisions are described efficiently 
within a cylindrical coordinate system ( p t ,p,4> ) where 
(relative to the collision axis) pt is the transverse mo¬ 
mentum, <j) is the azimuth angle from a reference direc¬ 
tion and pseudorapidity 77 = — ln[tan(#/ 2 )] ss cos( 0 ) is a 
measure of polar angle 0 , the approximation being valid 
near r\ = 0 (9 = 7 t/2 ). A bounded detector angular ac¬ 
ceptance is denoted by intervals (A 77 , A <j>) on the primary 
single-particle space ( 77 , 0 ). 

In general, two-particle correlations are measured on 
the 6 D space (pti,pi,4>i,pt2,P2,4>2)- p t -integral angular 
correlations are measured on the 4D space ( 771 , <fi 1 , 772 , ^> 2 )- 
Within a limited p acceptance and over 27 t azimuth the 


C. Correlation measures 

Correlation structure is identified by comparing a 2D 
pair density p{pa, <Pa) with a reference density p re f rep¬ 
resenting no significant correlations or some uninteresting 
background structure. p re f can be based for instance on 
a factorization assumption ( p re f = p q) or a distribution 
of mixed pairs formed from different but similar sample 
events ( p re f = p m ix)- The difference A p = p — p re f 
should reveal correlation structure of interest. 

Correlation structure may have several components 
arising from different collision mechanisms. Correlation 
amplitudes may vary with collision conditions in char¬ 
acteristic ways, for instance proportional to n c h , N part , 
Nbi n or some combination. As a placeholder we define a 
per particle measure A p/y/p re f since p re f « /?§ according 
to a factorization assumption, and po = d 2 n c h/dp<pA is 
the mean single-particle charge density near the angular 
origin. Practically speaking the correlation measure is 
obtained as 

A p _ \ p 

== = Po - 

\J Pref L Pmix 

where the ratio inside the square brackets reduces cer¬ 
tain instrumental effects [9|. In what follows we refer to 
symbol A to simplify notation. 

A 2D autocorrelation in the form Ap/^/p re f —► A is 
a density defined with the prefactor d 2 n c h/dpd<f>. When 
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integrated over rj a the autocorrelation is a density on 0 a 
defined by prefactor dn c h/d(j). Integration of A(<f> a) over 
the azimuth acceptance should then give 2-7 tA « 0 since 
Pmix has the same pair number as p by construction. 


|j7a| ~ 2. As noted, the 2D correlation histograms sum to 
zero by construction. We also adjust the ID projections 
onto 0 a to zero sum leading to one less data DoF (12). 


D. Bayesian Inference strategy 

For each ID data histogram we construct a PS as a 
reference for BI analysis and identify within the PS the 
signal and noise components. PS structure can be related 
1-to-l with BI elements, helping to clarify interpretation 
of the latter. Based on results from Ref. [9j we compare 
the PS for a fitted ID Gaussian with each data PS. 

For each model function D(w ) we obtain the minimum 
X 2 (maximum likelihood describing fit quality) and in¬ 
formation / (derived from priors and covariance matrix) 
from fits to data histograms. We obtain evidence E for 
each model from a combination of minimum x 2 and in¬ 
formation I. Competition between x 2 and I contrasts 
goodness of fit (via \ 2 ) with quantitative assessment of 
model-parameter “cost” or Occam penalty (via I). One 
model function may achieve a quantitatively better fit to 
data than another model, but at the cost of extra model 
parameters that may favor the second model overall. 

We emphasize that the number of data DoF in this 
study is small, only 11 for the projected ID histograms 
analyzed here compared to the original 2D histograms 
with 169 DoF. The small number of data DoF presents 
unique challenges for data modeling and BI evaluation. 


V. CORRELATION DATA AND MODELS 


The data we consider were published in the form of 
2D binned histograms (autocorrelations) derived from 
1.2M 200 GeV Au-Au collision events sorted into eleven 
centrality classes based on charged-particle multiplicity 
Tlch 0 . Depending on centrality each collision event may 
include from a few to more than a thousand charged par¬ 
ticles within the detector acceptance (A 77 , A 0) = (2, 2n). 

In the present study we consider ID projections of the 
2D histograms onto azimuth difference 0 a represented 
as 0 to simplify notation. The histogram bin size on 
azimuth is <50 = 2tt/N (N = 24 bins). The position 
variable is then 0„ = n2n/N with n £ [0, AT — 1]. The 
conjugate index for a PS (Sec. Ill) is m £ [0, N — 1]. The 
argument of PS cosines is m(j) n = rein 2 tt/N. The bin size 
has been optimized to match the observed correlation 
structure and provides sufficient resolution to retain all 
information in the data, as indicated for instance by the 
power spectrum in Fig. [3] 

The 2D data are symmetrized on both rj a and <f> a- 
Thus, only one quadrant of each 2D histogram is unique. 
The statistical errors on 0 a are uniform except for bins at 
0 and 7T where they are y/2 larger. The errors on ?;a are 
strongly varying due to the triangular pair acceptance 
on 77 A, with the largest errors at the acceptance edges 


A. Correlation data histograms 

Figure [T] (left panels) shows 200 GeV Au-Au 2D an¬ 
gular correlations for centrality bin 0 (83-94%, ss N-N 
collisions) and bin 10 (0-5%). Within the STAR TPC 
acceptance the pt-integral correlation data from Au-Au 
collisions include four principal components: (a) a same- 
side (SS) 2D peak at the origin on ( 77 a, 0a) well approx¬ 
imated by a 2D Gaussian for all p t -integral data, (b) an 
away-side (AS) ID peak on azimuth well approximated 
by an AS dipole [cos(0a — 7r) + l]/2 for all data and 
uniform to a few percent on ?7 a (having negligible curva¬ 
ture), (c) an azimuth quadrupole cos(20a) also uniform 
on 77 A to a few percent over the full angular acceptance 
of the STAR TPC, and (d) a narrow ID peak on t]a- 
There is also a sharp 2D exponential peak at (0,0). That 
phenomenological description does not rely on physical 
interpretations of the components. 



FIG. 1: (Color online) Left: 2D angular autocorrelations 
from 200 GeV Au-Au collisions for (a) 83-94% (~A-A col¬ 
lisions) and (c) 0-5% centralities. Right: Two-dimensional 
model fits to the histograms in the left panels obtained with 
Eq. fl4| ). 

Based on subsequent comparisons of observed data 
systematics with theory the components (a) and (b) 
together are interpreted to represent minimum-bias di¬ 
jets 02lJ. Component (c) has been conventionally at¬ 
tributed to elliptic flow m- Component (d) is attributed 
to projectile-nucleon dissociation. And the 2D exponen¬ 
tial is attributed to Bose-Einstein (quantum) correlations 
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and charge-neutral electron pairs from photoconversions 
(denoted as the BE/electron peak). 


B. Correlation data models 


The fit methods employed here are based on the the 
non-Fisherian ansatz that data can be represented as the 
sum of a hypothesis (any competing data parametriza- 
tion) plus noise. 2D histograms from Ref. [5j [e.g. Fig .0 
(a) and (c)] were fitted with a data model including sev¬ 
eral elements applicable to higher RHIC energies and all 
Au-Au centralities. The 11-parameter model is 


A(vaAa) 



Ad [cos(^a — 7r) + l ]/2 + Aq 
A q 2 cos( 2 </>A)+A so f t exp < - 



the PS introduced in Sec. ITTT1 


A(</>a) 


Am exp 


1 (± a\ 2 ' 

2 \ a <p& ) 


- A' d 2 cos(</a) 

+ Aq + Aq2 cos(2 </a) ■ 


(16) 


A further simplification is possible for the most-central 
(0-5%) bin. The quadrupole amplitude Aq for that cen¬ 
trality is observed to be consistent with zero (29jj30j. The 
ID model then includes only 4 parameters 


A(<M) 


Aid exp 


1 (l aV 

2 Wa / 


+ Aq — Aq2 cos((/>a). 


(17) 


Integrating Eqs. © and © with differential factor 
d(f )a = 2n/N gives 


+ Abe exp 



27tP 0 = AidVZkct^ + 2irA 0 or (18) 

Aq = — AiDO‘0 A /v / 27r + Pq. 


The definitions of two parameters in that expression (Ad 
and Aq) are modified from those in Ref. |9j. 

Figure |T| (right panels) shows typical 2 D model fits 
with Eq. [14] compared to corresponding data histograms 
in the left panels. The fit residuals are consistent with 
bin-wise statistical errors. The general evolution with 
centrality is monotonic increase of the SS 2D peak and AS 
dipole amplitudes (dijet structure), substantial increase 
of the SS peak ija width, rapid decrease to zero of the 
ID Gaussian on 77 a (soft component) [ 8 ] [9, :5D] and non¬ 
monotonic variation of the quadrupole amplitude K9I . 

For the present ID study we develop simplified ver¬ 
sions of the 11-parameter model. In more-central Au-Au 
collisions the soft component (A so f t ) falls to zero ampli¬ 
tude, and the BE/electron component (Abe) becomes 
very narrow [5]. A 2D model applicable to more-central 
Au-Au collisions then has 6 parameters 


A(?7a,<M) 


A 2 d exp 




A d [cos(</a - 7r) + l]/2 
Aq + Aq 2 cos (2 4>a). 



(15) 


The BE/electron component remains significant in a few 
bins near the origin that can be removed from the fits. 

Projection onto ID azimuth represents large informa¬ 
tion reduction. The full 2D histogram with 25 x 25 bins 
includes 169 independent bins (one independent quad¬ 
rant due to symmetrization), whereas ID projections in¬ 
clude at most 13 independent bins. A simplified model 
derived from the 2D data model but applicable to pro¬ 
jected ID azimuth correlations in more-central A-A colli¬ 
sions includes 5 parameters defined to be consistent with 


The ID data histograms have been adjusted to insure 
Pq = 0. A fit to bin-10 data with Eq. |~iT| determines an 
offset value Aq = —0.14. With other fitted parameter 
values we obtain 


Aukt^/VZk = 0.57 x 0.635/72)/ = 0.144. (19) 

The four-parameter ID model can then be further re¬ 
duced to a three-parameter model defined by 


A((j> a) = Aid ^ exp 


_i ( /M_Y 


2 V J 


- CT0 A /727T \ 


- Ad2cos((/a), 


( 20 ) 


where each of two model components integrates to zero 
over 27 t. We therefore replace Eq. (17) with Eq. (20) 
referred to below as the “basic Model.” 

Since all data histograms are corrected to Pq = 0 to 
remove the offset DoF the adjusted 13-bin ID data his¬ 
tograms have 12 independent DoF. But the bin at </>a = 0 
is removed from all model fits to exclude the BE/electron 
component, reducing the effective data DoF to 11. 

The AS dipole component is the limiting case of an 
AS Gaussian peak array (see App. [Djfor details). The 
r.m.s. peak width (a « tt/ 2 ) is large enough that only the 
m = 1 AS dipole term of the PS representation survives. 

We define alternative data models by adding to the ba¬ 
sic Model of Eq. (20 1 successive cosine terms of the form 
Ax2 cos(?u</>a), where X = Q, S, O for quadrupole, sex- 
tupole and octupole (to = 2, 3, 4). We also define in¬ 
dependent “FS-only” models as truncated Fourier series 
with K cosine terms and no other components. 


















VI. BIN-10 0-5% AZIMUTH CORRELATIONS 

We first apply BI methods to the ID azimuth projec¬ 
tion from 0-5% central 200 GeV Au-Au collisions. We fit 
the data with the basic Model and obtain the data PS. 
We determine \ 2 and information I for FS-only models 
vs parameter number K. We then evaluate evidence E 
for several competing models and determine the posterior 
model probabilities. 


B. Data power spectrum 

Figure [3] shows the PS (points and blue solid curve) as 
a Fourier transform of the data autocorrelation in Fig. [2] 
using Eq. ©• The general structure includes a signal 
component at smaller wave number to < 4 and a flat 
(on average) white-noise spectrum at larger wave num¬ 
ber corresponding to the r.rn.s. statistical error in the 
data histogram. The noise-spectrum mean is about 0.001 
(dotted line). 


A. ID azimuth projection 


Figure [2] shows a projection of the 2D data histogram 
from 0-5% central 200 GeV Au-Au collisions onto ID </>a 
( points). 24 bins are shown but only 13 are unique due to 
symmetrization about zero and 7r. Estimated statistical 
errors have been multiplied by factor 2 to make them vis¬ 
ible (extend outside the points). Errors are a factor \/2 
larger for the bins at 0 and 7 r because of symmetrization 
of the data about those bins. The bin at zero also includes 
a significant contribution from BE/electrons not included 
in the models used for this exercise and is therefore ex¬ 
cluded from all fits. The bin at 7r includes a small excess 
due to a tracking-geometry distortion accommodated in 
some model fits by addition of a “delta function.” 
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FIG. 2: (Color online) ID projection onto azimuth (points) 
from the 2D data histogram for 0-5% central 200 GeV Au-Au 
collisions in Fig. [I] (c). Statistical errors at 0 and 7r are \pl 
larger than the others due to symmetrization of data on the 
periodic variable. The bin-wise statistical errors 0.0037 have 
been multiplied by 2 to make them visible outside the data 
points. The (red) dashed curve is obtained from a fit to the 
data with the basic Model of Eq. (20 1 . A fit with an FS-only 
model including four or more terms would appear identical 
on the scale of this plot. A similar remark applies to corre¬ 
sponding data plots for two other centrality bins. 


A fit of the basic Model to data is shown by the dashed 
(red) curve. The fitted model parameters are Aid = 
0.57±0.007, cr 0A = 0.635±0.007 and A' D = 0.115±0.002 
with x 2 = 12-5 for 11 — 3 = 8 fit DoF. 



FIG. 3: (Color online) Power spectr um values P m (points) 
derived from the data in Fig. [ 2 ] via Eq. ( |12| ). The (red) dashed 
curve is the Gaussian PS described by Eq. (211 with width 
and amplitude corresponding to the fitted Gaussian in Fig. [2] 
Interval m > 5 is consistent with a “white-noise” power spec¬ 
trum (dotted line) representing the statistical noise in Fig. [ 2 ] 


To aid interpretation of the data PS we include the 
predicted PS for a ID Gaussian (red dashed curve) with 
amplitude and width derived from the basic-Model fit in 
Fig- U The PS amplitudes for a unit-amplitude periodic 
Gaussian peak array on 4 >a are given by (App. 0 

2P m (a> A ) = y/2pK o> A exp (—m 2 cr^ A /2) . (21) 


As the Gaussian peak width cr^ A increases the number of 
significant signal terms in the PS decreases. The Gaus¬ 
sian PS coincides with the data PS for m £ [2,5], and 
the data PS for to > 5 is consistent with statistical noise. 
The data PS element for to = 1 includes a negative con¬ 
tribution —A' d = —0.115 from the AS peak (dipole). 

We can assess the quality of the basic-Model data de¬ 
scription by determining the PS of the residuals, not of 
(data — Model) but of (data — Gaussian) only. The PS 
of the residuals should be equal to the PS difference in 
Fig. [3] according to the linearity of Eq. (12). 

Figure |4] shows the PS for (data — Gaussian) referring 
to the fitted Gaussian in Fig. [2] The PS values for to > 1 
are consistent with the white-noise spectrum. The value 
for to = 1 (not shown) is consistent with the fitted dipole 
amplitude. From this PS study we have a first indication 
that the K = 3 basic Model is sufficient to describe the 
bin-10 ID azimuth projection. 
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FIG. 4: (Color online) The PS for residuals in the form (data 
— Gaussian) from Fig. [ 2 ] consistent with the white-noise part 
of the PS in Fig. [3] with mean approximately 0.001. The 
negative PS value for m = 1 (not shown) corresponds to the 
AS dipole amplitude — A' D from the basic-Model fit in Fig. [ 2 ] 


minimum for —2 LE (and maximum for evidence E) oc¬ 
curs at K = 4 indicating the FS-only model preferred by 
the data. That result is consistent with Fig. [3] indicating 
a K = 4 FS-only model should exhaust the PS signal. 

The x 2 trend indicates that the FS model components 
are ideally ordered on index k £ [1,I£] for the signal in 
these specific data, and is similar to the idealized trend 
suggested in Fig. 5.1 of Ref. EZj. The largest decreases 
occur for the smallest index values. The interval with 
larger (negative) slope at smaller K corresponds to ac¬ 
commodation of the data signal with increasing I\. The 
interval with smaller slope at larger K indicates that 
additional Fourier terms only accommodate statistical 
noise. The overall % 2 trend then matches the power- 
spectrum trend in Fig. [ 3 } x 2 must go to zero when K = 
the number of data DoF (11 in this case). The lower solid 
curve represents the x 2 for fits to the residuals (data — 
basic Model) from Fig. [ 2 ] (no signal present). The x 2 
values are then consistent with the fit DoF « 11 — K. 


C. Bayesian model fits with Fourier series 


D. Bayesian model comparisons 


We next apply BI methods to FS-only models of the 
data histogram in Fig. [2] to establish a BI reference. In 
this application the number of parameters K represents 
the largest value of FS index k for a given FS-only model. 
Varying K represents different FS data models. We ob¬ 
tain the x' 2 and information I for each FS-only model. 



We next extend BI methods to several data models 
with different combinations of elements and parameters 
compared to the previous FS-only exercise. We first com¬ 
pare x 2 alone, simulating a conventional model-fit exer¬ 
cise, then extend to comparisons of evidence E{D*\H). 



FIG. 5: (Color online) \ 2 (upper solid curve and points) 
and information 2 1 (dashed curve and points with uncertainty 
band) vs number of parameters K for Fourier-series (FS-only) 
models. The sum (log Evidence, —2 LE, dotted curve) is also 
included. The lower solid curve is \ 2 values for fits to residuals 
(data — basic Model) from Fig. [ 2 ] consistent with the trend 
11 — K expected for no signal (noise only) in the data. 


Figure[5]shows the basic elements of BI model fits. The 
upper (blue) solid curve and points represent the log like¬ 
lihood (LL) in the form —2 LL or x 2 - The (red) dashed 
curve shows information 21 representing the parameter 
cost (Occam penalty, Sec. IIF). The (black) dotted curve 
represents the sum —2 LE (negative log evidence). The 


FIG. 6: (Color online) y 2 values vs number of parameters K 
for several data models. The general trend is monotonic de¬ 
crease with increasing number of model parameters, respond¬ 
ing only to statistical noise with \ 2 ss 11 — K for K > 4. 

Figure [6] shows x 2 values for various model fits to data. 
The FS-only description (blue points and line) achieves 
a substantial decrease for I\ = 4 but no significant im¬ 
provement with additional terms. The basic Model with 
three parameters (red solid square) has x 2 ~ 12.5, some¬ 
what in excess of the number of fit DoF = 8. Addition 
of more cosines (quadrupole, sextupole, octupole) to the 
basic Model keeps pace with the FS noise trend with 
its reduced slope. In this conventional context the extra 
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cosines seem to be required for competitive data descrip¬ 
tion because they reduce the fitted y 2 , but at what cost? 

The basic Model + quadrupole + sextupole + octupole 
with I\ = 6 (open diamond) has the same \ 2 as the K = 
4 FS-only model. As explained below in connection with 
Table [I] the additional cosine terms effectively displace 
the Gaussian part of the basic Model. The composite 
model then functions as a FS-only model with I\ = 4, 
but with increased cost in the Occam penalty. 



FIG. 7: (Color online) Negative log evidence —2 LE vs num¬ 
ber of parameters K for several models. The basic Model 
(solid square) is strongly favored over all others (lowest —LE). 
The hatched band indicates the common uncertainty of priors 
assigned to cosine terms in all models. FS-only models for all 
K (solid dots and line) are strongly rejected by the evidence. 


Figure [7] shows negative log evidence —2 LE = x 2 + 21 
for several models. Adding an Occam penalty in the form 
of information I gained by each model reveals a differ¬ 
ent picture. The basic Model with I\ = 3 has substan¬ 
tially smaller -2LE (larger evidence E) than other models 
where the cost of extra parameters is not justified by a 
compensating reduction in y 2 - The hatched band reflects 
the estimated uncertainty in I (for the FS-only model) 
arising from the estimated priors. 

Given that x 2 values for various models are similar (« 
11 — K) the large differences in —LE among models must 
be dominated by information I which depends on the 
covariance matrix and prior PDFs. It might be suggested 
that such differences arise mainly from the assignment of 
prior probabilities, but that is not the case. We apply 
the same prior to a given parameter or parameter class 
consistently across all models, so that uncertainties in 
I are strongly correlated across competing mod els and 
largely cancel when odds ratios are taken (see Sec. IX C). 

The LE trend vs K for FS-only models arises from 
21 ss 10 K, whereas the LE trend for the basic Model 
plus additional cosine terms corresponds to 21 ss 5 K. 
The difference in I/K of 2.5 corresponds to a factor 
exp(2.5) ~ 12 difference in parameter errors for the 
two models. Parameter errors for FS-only models are 
0(0.001) whereas errors for the basic Model plus cosine 
terms are 0(0.01), accounting for the factor 10-15 differ¬ 


ence. As discussed in Sec. |XB the large Occam penalty 
for FS-only models is mainly owing to smaller covariance- 
matrix elements (parameter errors). 
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FIG. 8: (Color online) Normalized p(Hi\D*) from Eq. l ]22[ ) 
for several models indexed by l. As in Fig. [7] the basic Model 
(solid square) is strongly favored over all other models while 
FS-only models for all K are strongly rejected. 


Figure [8] shows the plausibility (relative evidence) for 
each competing model in the form 


p{Hi\D*) 


■E(£>*|ff,)p(ffQ 

Y.iEmHfipm 


( 22 ) 


that reveals the full selectivity of the BI method. For 
this exercise we assume that model prior probabilities 
p(Hi) are all equal (and therefore irrelevant). However, 
implicit p(H[) assumptions do play a role in RHIC/LHC 
data modeling and physics interpretations. 

The most plausible models are the basic Model (80%) 
and basic Model + quadrupole (15%). Large Occam 
penalties reduce competing additional multipole elements 
to a few percent or less. The model including an octupole 
(open diamond) leads to major fit instabilities and is re¬ 
jected. With plausibilities of less than 1% FS-only mod¬ 
els are also rejected. In terms of odds the basic Model is 
preferred over Model + quadrupole by 4.6 ± 0.7:1, over 
Model + sextupole by 28.0 ± 4.8:1 and over all FS-only 
models by 360 ±42:1. 

As noted in Section II-E Bayesian comparisons among 
models are effected by taking ratios of evidences (odds 
ratios). Comparisons are visualized efficiently by corre¬ 
sponding differences on a log-evidence scale (Bayes fac¬ 
tors) as in Fig. [7] and subsequent equivalent figures. Iso¬ 
lated absolute numbers are not relevant to our method. 


E. Model-fit results for bin 10 

Table [I] summarizes the best-fit model-parameter val¬ 
ues w obtained from model fits (minimum % 2 ) emphasiz¬ 
ing the basic Model (column 2) and successive additions 
of quadrupole, sextupole and octupole components, as 
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well as a delta function at ir to accommodate a data ar¬ 
tifact. The parameters are as defined in Sec. V B Also 
shown are y 2 and the BI parameters 21 and —2 LE. 


TABLE I: Bin-10 model parameters (minimum y 2 ) for several 
fit models: (a) basic Model, (b) basic Model plus quadrupole 
term Aq, (c) previous plus sextupole term As, (d) previous 
plus octupole term Ao, (e) basic Model plus delta function 
at 7 r. The fit parameters are as defined in Sec. |V B| 


A. ID azimuth projection 

Figure [9] shows a projection of the 2D data histogram 
from 9-18% central 200 GeV Au-Au collisions onto ID (f >a 
(points). As for the previous centrality the bin at zero 
also includes a significant contribution from BE/electrons 
not included in the data models and is therefore excluded 
from the fits. The typical data r.m.s. statistical error is 
0.0026, not visible outside the points on this scale. 


parameter 

basic Model 

+ Aq 

i A s 

i Ao 

+6 

Am 

0.57T0.007 

0.73T0.09 

0.84 

0.34 

0.57 

a 4‘& 

0.64T0.007 

0.69T0.02 

0.71 

0.09 

0.63 

A'd 

0.12T0.002 

0.15T0.02 

0.18 

-0.003 

0.115 

Aq 

- 

-0.014i0.007 

-0.025 

0.064 

- 

As 

- 

- 

0.005 

0.024 

- 

Ao 




0.005 


As 





0.005 

X 2 

12.5 

9.7 

10 

9 

11 

2/ 

28 

34 

38 

44 

36 

-2 LE 

40.5 

42.5 

48 

53 

47 


Results for the basic Model are in good agreement with 
the published values from 2D model fits [9]. For this 
centrality the best-fit 2D parameters from the model of 
Eq. (14) are % 2 D = 0.65±0.04, (T 0 a = 0.63±0.015, Ao = 
—0.14 ± 0.014 (consistent with the Gaussian integral), 
A D = 0.224 ± 0.002 (« 2A' d ), Aq = 0.001 ± 0.008 with 
y 2 / DoF = 2.6. Note that Am must be less than A 2 D 
because of the curvature on t/a of the SS 2D peak. The 
y 2 /DoF = 2.8 of the 2D model fit is substantially higher 
than that for the ID fit with the basic Model [12.5 / 
(11 - 3) = 1.6] because of significant structure on tja 
( 77 -modulated dipole) not described by the standard 2D 
data model of Eq. ([14). 


As cosine terms are added to the basic Model a con¬ 
flict develops between the explicit Gaussian component 
and a sum of cosines approximating a competing Gaus¬ 
sian. The large parameter differences for “+Ao” vs “ba¬ 
sic Model” columns are discussed further in Sec. IX Cl 
The +i5 column refers to the basic Model plus a free 
amplitude in the bin at 7 r (“delta function”). Compared 
to the basic Model alone there is reduction of y 2 by 1.5 
but increase of information 21 by 8 leading to overall 
increase of negative log evidence —2 LE by 6.5. The ad¬ 
ditional model DoF is rejected by exp(3.25) -A 25:1. 


VII. BIN-8 9-18% AZIMUTH CORRELATIONS 

In this second of three examples the statistical errors 
of the wider centrality bin are reduced by factor \/2 com¬ 
pared to the 0-5% centrality bin. The BE/electron peak 
is still narrow enough to remain within the single bin at 
zero. The quadrupole component is significant and posi¬ 
tive, shifting the plausibility order of competing models. 
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FIG. 9: (Color online) ID projection onto azimuth (points) 
from the 2D data histogram for 9 - 18% central 200 GeV 
Au-Au collisions. The (red) dashed curve is a fit to the data 
with the basic Model of Eq. ( |20[ ) plus independent quadrupole 
component Aq2 cos(2<^a). The bin-wise statistical errors are 
0.0026, not visible outside the points. 


A fit of the basic Model + quadrupole to data is shown 
by the dashed (red) curve. The fitted model parameters 
are A 1D = 0.926 ± 0.088, o^ A = 0.727 ± 0.018, A' D = 
0.206 ± 0.022 and Aq = 0.068 ± 0.006 with y 2 = 16.5 for 
11 - 4 = 7 fit DoF. 


B. Data power spectrum 


Figure 10 shows the power spectrum (points and blue 
solid curve) derived from the data in Fig. [ 9 ] As for the 
0-5% centrality bin we include the predicted power spec¬ 
trum (red dashed curve) for a ID Gaussian (SS peak) 
with amplitude and width parameters derived from the 
fit to data in Fig. [9] The data PS is again consistent with 
statistical noise for m > 5. The PS element for m = 1 
includes a negative contribution from the AS dipole. The 
element for m = 2 includes a significant positive contri¬ 
bution from a quadrupole component not associated with 
the SS peak [29] , 

Just as for bin 10 we assess the quality of the basic- 
Model data description by determining the PS of the 
residuals of (data — Gaussian) only, where Gaussian is 
the fitted Gaussian in Fig. [ 9 } The PS for (data — Gaus¬ 
sian) is consistent with a white-noise spectrum with mean 
value « 0.0013 for m > 2. The values for m = 1, 2 are 
consistent with the fitted positive quadrupole and nega- 
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FIG. 10: (Color online) Power spectrum values P m (points) 
derived from the data in Fig. [ 9 ] via Eq. ( |12|). T he (red) dashed 
curve is the Gaussian PS described by Eq. (211 with amplitude 
and width from the basic Model + quadrupole fitted to data in 
Fig- 1] The interval m > 5 is consistent with a “white-noise” 
power spectrum (dotted line) representing the statistical noise 
in Fig. [UJ 


tive dipole amplitudes. The basic Model augmented by 
quadrupole component cos(2^a) fully exhausts the data 
signal and is therefore a sufficient model. 


C. Bayesian model fits with Fourier series 

The log-likelihood LL trend in the form y 2 for I\ > 2 
for FS-only model fits to data from bin 8 (not shown) 
is similar to that for bin-10 data in Fig. [5j Information 
2 1 representing the parameter cost is also similar. The 
minimum of —2 LE occurs at K = 4, consistent with 
Fig. [l0| where we again find that a K = 4 FS-only model 
should completely describe the signal in the bin-8 data. 
The FS-only model should then be competitive with the 
K = 4 basic Model + quadrupole in terms of fit quality 
and parameter number, two elements of BI evaluation. 


D. Bayesian model comparisons 

Figure [ll] shows y 2 values from conventional data mod¬ 
eling. The FS-only model achieves a substantial reduc¬ 
tion for K = 4 but no significant improvement for ad¬ 
ditional terms. The basic Model with K = 3 (solid red 
square) has a y 2 much elevated from the number of fit 
DoF = 8 and is rejected on that basis. Addition of a 
quadrupole component (solid green diamond) brings y 2 
down to an acceptable value. Addition of more cosines 
(sextupole, octupole) to the basic Model + quadrupole 
tracks the FS-only noise accommodation. 

The basic Model + sextupole (solid red triangle) has 
the same y 2 value as that for basic Model + quadrupole. 
The Gaussian + dipole + sextupole combination can in¬ 
teract to accommodate the independent quadrupole com¬ 


FIG. 11: (Color online) y 2 values vs number of parameters K 
for several data models. The general trend is again monotonic 
decrease with increasing parameter number. 


ponent in the data, since the octupole component of the 
Gaussian is only a few sigma above the statistical noise. 
Interactions among the basic Model Gaussian and addi¬ 
tional cosine terms are discussed in Sec. IX Cl 



FIG. 12: (Color online) Negative log Evidence —2 LE vs 
number of parameters K for several models. The basic Model 
+ quadrupole (solid diamond) is strongly favored over others 
(lowest —LE, largest evidence). The basic Model alone (solid 
square) is strongly rejected by the evidence, as are FS-only 
models for all K (blue points and line). 


Figure |l2| shows negative log evidence —2 LE for vari¬ 
ous models. The basic Model + quadrupole (solid green 
diamond) corresponding to K = 4 model DoF has sub¬ 
stantially smaller -2LE (larger evidence E ) than other 
model combinations. It is clearly preferred over the ba¬ 
sic Model alone by exp(5) —> 187 ± 30:1 odds. For other 
models the cost of extra parameters is not justified by 
reductions in \ 2 . The quadrupole model component is 
preferred over a sextupole by exp(2) —1- 8.5 ± 1.4:1 due 
to differences in the fit covariance matrix for the two 
models. All FS-only models are again rejected by large 
factors. 
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VIII. BIN-0 83-94% AZIMUTH CORRELATIONS 

In this third of three cases, essentially representing p-p 
(N-N) collisions, we encounter a major challenge for BI 
analysis from several sources: (a) The SS peak on az¬ 
imuth contains two contributions that cannot be sepa¬ 
rated easily by discarding the bin at the origin as they 
were for bins 8 and 10, (b) the signal amplitude is much 
smaller relative to statistical noise (15:1) than it was for 
more-central collisions (200:1), and (c) the SS peak is 
substantially broader on azimuth. 


A. ID azimuth projection 

Figure [l3| shows a projection of the 2D data histogram 
from 83-94% central 200 GeV Au-Au collisions (points). 
Unlike previous cases the SS peak includes a signifi¬ 
cant contribution from BE/electrons that is not included 
in the models (conversion electron pairs do fall mainly 
within the single bin at the origin). A fit of the ba¬ 
sic Model to data is shown by the dashed (red) curve. 
The fitted model parameters are Aid = 0.073 ± 0.025, 
a ( / >A = 0.926 ± 0.128 and A' D = 0.022 ± 0.006 with 
x 2 /DoF = 12.0 for 11 - 3 = 8 fit DoF. 
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FIG. 13: (Color online) ID projection onto azimuth (points) 
from the 2D data histogram for 83 - 94% central 200 GeV 
Au-Au collisions in Fig. [I] (a). The (red) dashed curve is a fit 
to the data with the basic Model of Eq. (201. The statistical 
errors are 0.0026. 


B. Data power spectrum 


Figure 14 shows the power spectrum (points and blue 
solid curve) derived from the data in Fig. 13 As for pre¬ 
vious centrality bins we include a predicted power spec¬ 
trum (red dashed curve) for a ID Gaussian (SS peak) 
with amplitude and width parameters derived from the 


basic Model fitted to data in Fig. 13 


Because the bin-0 SS peak is broader on azimuth (thus 
narrower on index to) and the S/N is much smaller the 



FIG. 14: (Color online) Power spectrum values P m (points) 
derived from the data in Fig. [13] via Eq. ( {72] ). The (red) 
dashed curve is the Gaussian PS described by Eq. (211 with 
Gaussian width and amplitude corresponding to the basic 
Model fitted to data in Fig. |13| The interval m > 2 is consis¬ 
tent with a “white-noise” power spectrum (dotted line) rep¬ 
resenting the statistical noise in Fig. |13| 


PS signal is not significant at to = 4 or even m = 3. A 
K = 2 FS-only model in the form of dipole + quadrupole 
should be sufficient to displace the basic Model. For bins 
10 and 8 FS-only models are clearly excluded in favor of 
the basic Model, but for bin 0 the K = 3 basic Model 
and a I\ = 2 FS can both describe the two data DoF. 
Thus, we expect BI analysis to prefer the FS-only model. 


C. Bayesian model fits with Fourier series 


Figure 15 shows the FS-only \ 2 trend for bin-0 data 
(upper solid blue curve and points). As expected, X 2 
drops to the noise trend (lower solid curve) by I\ = 2. 
Additional terms accommodate statistical noise. In¬ 
formation 2 1 follows the expected monotonic increase 
« 10A'. Negative log evidence —2 LE has a minimum 
for I\ = 2. Thus, an FS-only model with K = 2 is pre¬ 
ferred by the data, as expected from the PS in Fig. [14] 


D. Bayesian model comparisons 


Figure [l6] shows X 2 trends for several competing mod¬ 
els applied to the bin-0 data in Fig. 13 The K = 3 ba¬ 
sic Model and FS-only model describe the data equally 
well, and we expect the simpler K = 2 FS-only model 
to be preferred when an Occam penalty is included. For 
these bin-0 data the addition of a “delta” component at 
7 t (open square) leads to substantial improvement in the 
fit quality, consistent with Fig. [13] 

Figure |T7] shows the log evidence —2 LE trend. That 
the K = 3 basic Model (solid red square) is preferred 
over the K = 2 FS-only model (lowest blue point) de¬ 
spite the cost of the extra model parameter is a ma- 
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FIG. 15: (Color online) x 2 (upper solid curve and points) 
and information 2 1 (dashed curve and points) vs number of 
parameters K for FS-only models. The sum —2 LE (dotted 
curve and points) is also included. The y 2 trend for basic- 
Model fit residuals (lower solid curve) is approximately con¬ 
sistent with the expected noise trend 11 — K. 



FIG. 16: (Color online) x 2 values vs number of parameters 
K for several data models. The basic Model (solid square) is 
equivalent to the FS-only model with K = 3 (solid dot). 


jor surprise. The evidence ratio (odds) is 3.3 ± 0.25:1 
« exp(1.25). That result prompted a detailed study re¬ 
ported in Sec. |X B| on how information I is related to 
model priors and data, with supporting material pro¬ 
vided in App. [Aj 

The ability in this case to discriminate between the ba¬ 
sic Model and FS models, despite ID data with low S/N 
ratio, is a significant achievement for BI analysis. The 
correctness of the basic-Model preference is confirmed by 
analysis of 2D data histograms. From the 2D analysis of 
Ref. [9] we learn that the SS 2D peak is necessary for all 
centralities. In contrast, a ID FS-only model would fail 
dramatically for any 2D data, but that is not apparent 
from ID projections alone. 



FIG. 17: (Color online) Negative log Evidence —2 LE vs 
number of parameters K for several models. The I\ = 3 basic 
Model (solid square) is favored over all others (lowest —LE, 
largest evidence), especially over the K = 2 FS-only model 
expected to prevail for this centrality (lowest solid dot). 


IX. SYSTEMATIC UNCERTAINTIES 

Bayesian Inference methods provide a powerful system 
for discriminating among competing complex data mod¬ 
els with a consistent set of evaluation rules. Close exam¬ 
ination of method details and evaluation of uncertainties 
is required to insure confidence in the results. 


A. Uncertainties for data histograms 

For 2D histograms from Ref. i9] the angular acceptance 
was divided into 25 bins on the r)/\ axis and 25 bins on 
<pA, a trade off between statistical error magnitude and 
angular resolution. The histograms are by construction 
symmetric about r]/\ = 0 and </>a = 0, n. The 25 bins on 
</>A actually span 27r + 7 t/ 12 to insure centering of major 
peaks on azimuth bin centers. 2D binwise statistical er¬ 
rors are ±0.004 for 200 GeV data near |?7 a| = 0. Because 
of the ?7 a dependence of the pair acceptance statistical 
errors increase with |?7 a| as \JAr]/(Ar] — |?7 a|) with r) ac¬ 
ceptance A rj = 2. Errors are uniform on <j >a except that 
errors are larger by factor y/2 for angle bins with <f >a = 0 
and ±7r because of reflection symmetries. 

Statistical errors are approximately independent 
of centrality for the per-particle statistical measure 
A p/y/fhei over nine 10% centrality bins (0-8). An addi¬ 
tional factor y/2 increase applies to the two most-central 
centrality bins (9, 10) which split the top 10% of the total 
cross section. After projection onto ID azimuth for this 
study the centrality bin 10 errors are about 0.0037 except 
for the azimuth bins at 0 and ir. Errors for the other 
centrality bins (0, 8) are a factor l/y/2 less or 0.0026. 
X 2 values for optimized models in this study determined 
with those statistical errors are generally consistent with 
the number of fit DoF = data DoF — K (number of model 
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parameters), as demonstrated in Fig. [5j Thus, the sta¬ 
tistical and systematic uncertainties for data histograms 
used in this study are both small and well understood. 


B. Uncertainties for information estimation 


Information uncertainty is largely related to the choice 
of prior PDFs for various model parameters and the 
fitted-parameter uncertainties. We repeat the informa¬ 
tion definition in Eq. ([7]) 


I(D*\H) = -In 


\J(2n) K det Ck p(w\H ) 


(23) 


the natural log of prior volume V W (H) over posterior vol¬ 
ume V W (D*\H) in the model parameter space. The co- 
variance matrix Ck for a AT-parameter model is obtained 
from the Hessian describing the curvatures of the likeli¬ 
hood function near its maximum. The likelihood function 
is in turn determined by model H in combination with 
specific data D*. If the model and prior are defined and 
data specified the posterior volume is also well defined. 

Assuming translation invariance within a parameter- 
space volume where the likelihood is significantly nonzero 
the prior PDF for parameter w k is taken to be uniform 
across a bounded interval A*, into which the correspond¬ 
ing fitted parameter value should almost certainly fall. 
The prior volume for I\ parameters is then 

K 

l/p(w\H) = k = V w (H). (24) 

fc=l 

In principle, a prior is defined before data D* are ob¬ 
tained and thus should not depend on specific data. How¬ 
ever, it is fair to invoke general knowledge ( Q ) about the 
typical amplitudes of structures in such data. We know 
from experience that typical structure amplitudes (e.g. 
peak-to-peak excursions) are generally < 0(1). That ap¬ 
plies for example to the Gaussian amplitude in the basic 
Model, and to the Gaussian width based on the definition 
of the SS peak, implying that A*, ss 1 in those cases. 

What matters more than absolute estimates of Aj, is 
the relations among different models and model parame¬ 
ters. If prior-interval estimates are excessive for a partic¬ 
ular model it may be unduly penalized. Given the above 
assignment for a Gaussian amplitude, what is a fair as¬ 
signment for cosine coefficients? To that end we examine 
Eq. The autocorrelation to be modeled on the left 

receives contributions at the origin from several FS com¬ 
ponents Pm including factors 2. Thus, it is reasonable to 
assume that cosine coefficients and uncertainties may be 
substantially smaller on average than the Gaussian am¬ 
plitude and uncertainty. For all cosine components we 
assign A k = 1/3 and indicate prior-related uncertainties 
by including A*, = 1 and A*, = 1/5 as limiting cases for 
cosines (e.g. curve I and hatched band in Fig. [5]) . 


If we assume equal prior intervals A/- and equal vari¬ 
ances of. for K model parameters and negligible covari¬ 
ances among parameters information I simplifies to 


I 


K ( In 


V2TT(Jk_ 


constant. 


(25) 


In fits with FS-only models we observe a *. « 0(0.0007). 
Given Ass 1 we have ln(Afc/v^27nxfc) ss 6, while if 
we reduce to A^ —» 1/3 (assumed for all cosine terms) 
ln(Afc/v27T(7fe) ~ 5. If we further reduce A k —> 1/5 
with ln(Afc/\/27r<Tfc) ~ 4.7 the fitted parameter values in 
some cases contradict the prior, implying that the cho¬ 
sen prior interval is too small. We can then state that 
for all FS-only models I/K = 5.3 ± 0.6. For the basic 
model with added cosines the parameter uncertainties 
are more typically cr k « 0(0.01). In that case we ob¬ 
tain I/K = 2.6 ± 0.5. Those results imply that addition 
of a model parameter is justified (—2 LE = \ 2 + 2/ is 
significantly reduced) if the resulting decrease in \ 2 is 
significantly greater than 2 1/K « 10 for FS-only models 
and ss 5 for the basic Model plus optional cosines. 


C. Uncertainties for odds ratios 

As noted in Sec. Ill El odds ratios can be used to state 
quantitatively the BI relation between two models in 
the form of a probability ratio p(D*\Hi) /p(D*\H 2 ) —> 
E(D*\Hi)/E(D*\H 2 ), where equality to the second ra¬ 
tio assumes equal model priors p(H) for the two cases. 
In terms of log evidence LE the Bayes Factor is B\ 2 = 
ln.[E(D*\Hi)/E(D*\H 2 )\, and the odds is then exp(J3i 2 ). 

The uncertainty (error) in an odds ratio is deter¬ 
mined by the uncertainties in the compared evidences 
in turn dominated by uncertainties in the covariance ma- 
trix/Hessian and the prior PDFs. Uncertainties for the 
Hessian matrix are discussed in App. [5]and serve as the 
sole basis for the odds errors stated in the text. 

Uncertainties for the prior PDFs are discussed in the 
previous subsection. The priors for SS Gaussian ampli¬ 
tude and width are set to the minimum values consis¬ 
tent with experience, disfavoring the basic Model a priori 
and implying that any odds favoring the basic Model is 
a lower limit. A common uncertainty of a factor 2 ei¬ 
ther way is assumed for a cosine coefficient in any model. 
Because an odds estimate is a probability ratio system¬ 
atic errors correlated between numerator and denomina¬ 
tor cancel in first order, whereas uncorrelated random 
errors should combine quadratically. That property can 
be seen as an advantage for odds as a basis for model 
comparisons and minimizes the uncertainty contribution 
from cosine elements common to two compared models. 

If models with different K values (parameter number) 
are compared the unpaired systematic error is not can¬ 
celed. For instance, the odds between the basic Model 
(K = 3) vs FS-only model (K = 4) for bin 10 includes 
a linear dependence on the FS-only prior uncertainty for 
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one additional cosine term. However, the comparison of 
basic Model plus quadrupole vs FS-only model for bin 8 
(both K = 4) eliminates that uncertainty contribution. 

X. DISCUSSION 

We consider several issues that have arisen in applica¬ 
tion of BI methods to azimuth-correlation data models, 
including surprising performance of the ID basic Model 
in peripheral collisions, consistent strong preference for 
the basic Model by BI analysis, competition between 
Gaussian and cosine terms in data models, and impli¬ 
cations from this study for two theoretical narratives. 


However, that is not what we find in Fig. [lTJ The ba¬ 
sic Model maintains a significant advantage over a I\ = 2 
FS-only model, the odds ratio being « exp(1.25) = 3.5 : 
1 in favor of the basic Model, even with one more param¬ 
eter. That surprising result led to the detailed compar¬ 
isons in the next subsection and the study in App. |Aj 
The ID projection is not the only information we have 
about the source of these data. The unprojected 2D his¬ 
togram in Fig. [I] (a) clearly indicates that a SS 2D peak 
model is required by bin-0 data, and a 2D FS-only model 
would be rejected by a large factor [<2[52j. The 2D obser¬ 
vations contribute a larger Bayesian context ( Q ) applica¬ 
ble to this ID BI study. The FS-only model is ruled out 
for all 2D histograms as shown in previous studies [52]. 


A. Comparing bin 10 and bin 0 


B. Why BI analysis favors the basic Model 


The data structure for centrality bin 10 in Fig. [2] could 
be modeled as (a) two peaks at 0 and n, (b) as a Fourier 
series only, or (c) as a combination of such elements. The 
two peaks described by the basic Model are expected in 
a HEP/jets narrative describing high-energy nuclear col¬ 
lisions. FS models are expected in a QGP/flow narrative 
and are capable of describing any structure on periodic 
azimuth. Competition among data models thus reflects 
competition between theoretical narratives. 

In Fig. [3] we learn that all information in the data PS 
is confined to m £ [1,4]. Higher terms in an FS model 
describe only statistical noise. Comparing a PS Gaus¬ 
sian model with the data PS we find that four points are 
predicted by a Gaussian fitted to data, and one point 
corresponds to the fitted dipole within the basic Model. 
The K = 3 basic Model fully represents the data signal 
as demonstrated in Fig. [4] but so does a K = 4 FS-only 
model. Intermediate combinations of Gaussian + cosines 
also describe the data well. Models with more parame¬ 
ters continue to reduce y 2 as in Fig. [6j and might be 
preferred on that basis. 

However, when an Occam penalty is introduced in the 
form of information / dramatic differences among models 
appear, as in Figs. [7] and [S] In the latter figure the basic- 
Model probability is p{H{) ~ 80%, the next highest being 
basic Model + quadrupole with p(Hi) ss 15%. Adding 
more cosine components may reduce y 2 , but not to an 
extent that compensates large Occam penalties (increase 
in I). The additions are essentially “fitting the noise” 
and are strongly rejected by BI analysis. 

A different situation emerges for Bin 0. In Fig. [14] the 
data signal is confined to m £ [1.2] for two reasons: (a) 
The S/N ratio is reduced by a factor 13 and (b) the SS 
peak azimuth width is increased by 30% so the conjugate 
PS signal peak width is reduced by that factor. Conse¬ 
quently the “bandwidth” of the data PS signal is reduced 
from to £ [1,4] to m £ [1,2]. A K = 2 FS-only model 
with two parameters should then be strongly preferred by 
BI analysis over the I\ = 3 basic Model, given equivalent 
priors for the two models. 


The basic Model alone (for near-central collisions) or 
the basic Model plus quadrupole (for noncentral colli¬ 
sions) is strongly preferred by BI analysis over FS-only 
models, even for the most-peripheral collisions where the 
ID data include only two significant DoF. The choice of 
priors is not the reason; priors are applied consistently 
for each parameter type within any model. The large 
difference in evidence values is dominated by differences 
in the fit covariance matrix. The r.rn.s. parameter errors 
for FS models are consistently 10-15 times smaller than 
for the basic Model. Evidence differences correspond to 
differences in model predictivity , as illustrated in the fol¬ 
lowing comparison and App. [A] 

Figure [18] shows sketches of joint data-parameter 
spaces for an FS-only model (left) and the basic Model 
(right) corresponding to bin-10 data. The parameter er¬ 
rors for the FS-only model are typically ak ~ 0.0007. 
The parameter errors for the basic Model in Tableware 
0.007 for SS peak amplitude and width and 0.002 for 
dipole amplitude, but with added cosine terms the er¬ 
rors increase to « 0.01. The data errors for bin 10 are 
an ~ 0.0037. In terms of angles Okn defined in Eq. (A2) 
tan(0fc n ) ss 1/3 for the basic Model (20°) and 5 for the 
FS-only model (80°). The errors and tan (0k n ) are rep¬ 
resented by the dashed rectangles and the angles of the 


diagonals (solid lines) in the two panels, as in Fig. 19 


In Fig.[l8]the prior PDFs are represented by the verti¬ 
cal dash-dotted lines in each panel and the arrows labeled 
Afc. For all cosine amplitudes the prior is « 1/3. For 
the SS peak amplitude and width the priors are Af~ ss 1. 
We can estimate the evidences or predicted data volumes 
based on the argument in App. |X| where the relation be¬ 
tween data-space volume and parameter-space volume 
is determined by angle factors tan (dkn)- For the basic 
Model (right panel) even the larger priors (A^ ss 1) are 
mapped to smaller data intervals (A n « 1/3), whereas 
for FS-only models (left panel) smaller priors (Ak « 1/3) 
are mapped to larger data intervals (A„ ss 5/3). The re¬ 
sult is much smaller predicted data volumes Vd(H) for 
the basic Model, and consequently much higher evidence 
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FIG. 18: Joint parameter-data space for two data models. 
These panels are zoomed out from the scale of Fig. [T9] to re¬ 
veal the PDFs distributed over the entire parameter and data 
spaces. Given prior intervals A*, the model angles d^n de¬ 
termine the magnitudes of the predicted data intervals A n , 
the predicted data volume Vd(H), and therefore the evidence 
E(D*\H). The d^n , and hence the Jacobian of a model func¬ 
tion, largely determine the predictivity of a model. For these 
two models the typical tan (Q^n) differ by factor 12. 


and plausibility compared to FS-only models. 

The same argument applies to changes in evidence 
or information with increasing parameter number (e.g. 
added cosine terms). From Eq. (A9) (assuming compa¬ 
rable x 2 values for competing models) 


K 


I = £ln 


k =1 


A k 


_ \/ 2tt(t _ 


+ constant. 


(26) 


With A*. « 1/3 for all cosine terms, aj. ss 0.0007 for 
FS-only models and a: 0.01 for basic Model + cosines 
the typical increment per cosine term is I/1\ ss 5 for 
FS-only models and 2.5 for basic Model + cosines, e.g. 
consistent with Fig. [T] Thus, evidence and information 
trends are determined mainly by the relative parameter 
errors reflecting the Jacobians and model algebra. 

The fitted-parameter errors reflect the algebraic struc¬ 
ture of the model, as discussed in App. A. Because a 
Fourier series is orthogonal each coefficient is determined 
independently. Since the Fourier model elements individ¬ 
ually do not resemble the data there is required a very 
’’fragile” assembly of terms that easily overfits the data 
(treats noise as signal). Only a small range of FS-only 
parameter values can reproduce a given data set, and the 
parameter variances are consequently very small. 

In contrast, the basis Model includes a Gaussian (mo¬ 
tivated by the data structure) with nonlinear parame¬ 
ter (j 0 that covaries with other parameters. Thus, larger 
ranges of basic-Model parameters can reproduce the data 
adequately, and the parameter variances are correspond¬ 
ingly larger. The basic Model is more ’’robust” because 
on average the model elements individually look more like 
isolated data components. If both models give the same 
chi-squared fit the basic Model is preferred by BI analysis 
because on average it is far more likely to describe the 


data accurately (a larger fraction of the prior-delimited 
parameter space provides an acceptable description for 
the given data). 

We conclude that the key issue for Bayesian model 
comparisons is model predictivity. The basic Model is 
highly predictive (therefore falsifiable), describing two 
peaks (fixed at 0 and 7 r), with one peak as wide as pos¬ 
sible and the other somewhat narrower. Two peak am¬ 
plitudes and a width are the only parameters. The basic 
Model is consistent with the HEP/jets narrative but was 
inferred from data without any theory assumptions. In 
contrast, FS-only models can describe any structure on 
azimuth, have no predictivity (are not falsifiable) and are 
therefore strongly rejected by BI analysis. Model pre¬ 
dictivity [smallness of predicted data volume Vd(H)\ is 
determined largely by the algebraic structure of the data 
model (Jacobian) as revealed by fitted-parameter errors 
compared to data errors via the tan(#*,„) elements. 


C. Competition: extra cosine terms vs SS Gaussian 


The bin-10 results in Table Q] can be used to examine 
the consequences of adding one or more cosine terms to 
the basic Model when there is no corresponding data sig¬ 
nal. The x 2 is reduced in general, suggesting an improved 
data description. However, in some cases the model pa¬ 
rameters undergo large changes seeming to indicate that 
model parameters are very uncertain. To understand 
the apparent contradiction we consider the bin -10 “worst 
case” model (basic Model + quadrupole + sextupole + 
octupole) appearing in the next-to-last column of Table]!] 

The model difference (“+Ao” — basic Model) for 
each cosine coefficient is A A' D = 0.115, AAq = 0.064, 
AA S = 0.024 and AA 0 = 0.005 for m e [1,4], The dif¬ 
ferences correspond to the predicted Gaussian PS values 
in Fig. [3] (red dashed curve). In effect, changes in the 
cosine coefficients of the FS-only model are equivalent 
to the fitted Gaussian already describing the data signal 
correctly in the basic Model. The SS Gaussian required 
by the data is effectively excluded from the data model 
by the added cosine terms, reduced to a minor role m- 

The bin-10 result reveals a competition between the 
basic Model and a truncated FS to describe signal + 
noise. The competing truncated FS offers more flexibil¬ 
ity in accommodating noise compared to the monolithic 
Gaussian. The FS may “win” in terms of x 2 , but a well- 
chosen model element (Gaussian) describes only the sig¬ 
nal and excludes the noise. Referring to Fig. [ 6 ]the I\ = 6 
(“+ Ao") model (open diamond) has the same % 2 value 
as the K = 4 FS-only model (solid point) because the 
former is effectively a K = 4 FS. The Gaussian, with two 
parameters, has been excluded from the fit model owing 
to noise competition, but its two parameters still con¬ 
tribute to the Occam penalty. BI analysis then rejects 
the unnecessary cosine terms in favor of the basic Model. 
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D. Evidence extending beyond single histograms 

A model may describe data from some A-A centralities 
well but others poorly. Nevertheless, the model may be 
retained by convention because of desirable features (such 
as flow interpretations). Other forms of data selection 
(pt cuts, ID projections, ratio measures) present similar 
issues. In response we propose to extend BI methods be¬ 
yond single data histograms, combining results into one 
comprehensive evaluation for competing data models. 

The mechanism is suggested by the nature of Bayesian 
evidence E. The evidence is a probability , and by the 
rules governing probabilities the joint evidence for sev¬ 
eral cases should be the product of elementary evidences 
(assuming approximate independence). For instance, the 
evidence for a model of 200 GeV Au-Au collisions should 
be the product of evidences for individual centralities. If 
a model claiming to describe all data components is fal¬ 
sified for one component then it is falsified for all. More 
generally, a model that provides an adequate description 
for all cases may be preferred over a model that is favored 
for some cases but strongly disfavored for others. 

That principle extends not only to A-A centralities but 
to different collision energies, A-B collision systems, spec¬ 
trum and correlation measures and data cuts. Evidence 
E as a product measure introduces an “and” condition 
for data description. A candidate model must address all 
available data within its parameter space or be rejected. 


E. Implications for theoretical narratives 


As noted in the introduction HEP/jets and QGP/flow 
narratives currently compete to describe and interpret 
high-energy nuclear-collision data through choices of data 
model and emphasis on specific data and measured quan¬ 
tities. The HEP/jets narrative predicts two dijet-related 
peaks on ID azimuth, just what the basic Model de¬ 
scribes. Almost all ID azimuth correlation data from 
the RHIC are described by the basic Model + quadrupole 
with modest parameter variations. The QGP/flow nar¬ 
rative prefers various forms of the FS-only data model 
interpreted physically in a flow context, from a single co¬ 
sine ( V 2 , index k = 2) to several cosines interpreted to 
include “higher harmonic” flows (index k £ [1,5]). 

In the present study we apply BI methods to ID az¬ 
imuth data models associated with the two narratives. BI 
analysis strongly favors the basic Model in all cases, com¬ 
bined with an additional quadrupole cos(2 <f) term except 
for the most-central data. The FS-only model is strongly 
rejected in all cases. As discussed in Sec. XB| and App.[A| 
the main reason for BI rejection is lack of predictivity for 
FS-only models, whereas the basic Model is strongly pre¬ 
dictive and therefore falsifiable. The present BI analysis 
thus seems to support the HEP/jets narrative and reject 
the QGP/flow narrative per their data models. 

It could be argued that application of BI methods to 
data models represents an arbitrary choice motivated by 


interest in a specific outcome. However, we are faced with 
the requirement to evaluate conflicting data models ac¬ 
cording to some neutral criteria. % 2 minimization always 
prefers more-complex data models that may reveal little 
about data structure and possible physical mechanisms. 
Flow interpretations are always possible for FS-only mod¬ 
els, but such models cannot exclude a dijet interpretation 
since they are able to describe any data configuration. 

Additional criteria are therefore required to test data 
models. Guidance as to choice is provided by the role 
of rational inference within the scientific method. It is 
recognized that physical theories cannot be proven , can 
only be falsified by data, requiring that candidate theo¬ 
ries be predictive. Unpredictive theories are not falsifiable 
and are therefore rejected as candidates. In a Bayesian 
context predictivity is measured by information / and 
evidence E as demonstrated in this study. For a well- 
tested physical theory H encountering new data D* the 
information / « 0, and the predicted data-space volume 
Vd{H ) is small. If D* </ Vd(H) the theory is falsified 
but D* £ Vd{H) results in plausibility p(H\D*) —> 1: 
dramatically different results 

In the present analysis we encounter not competing 
physical theories but competing data models serving as 
proxies. BI analysis evaluates data models according to 
predictivity, i.e. the degree of restriction on allowed data 
configurations. We conclude that the basic Model with 
optional quadrupole component is very predictive, corre¬ 
sponding to small information gain from newly-received 
data and consequent small predicted data-space volume. 
FS-only models are not predictive, can accommodate any 
data configuration, and are therefore rejected. 


XI. SUMMARY AND CONCLUSIONS 

Based on data from the relativistic heavy ion col¬ 
lider (RHIC) and large hadron collider (LHC) claims 
have been made for formation in high-energy nucleus- 
nucleus (A-A) collisions of a strongly-coupled quark- 
gluon plasma (sQGP) with small viscosity - a “perfect 
liquid.” Such claims are based mainly on measurements 
of Fourier coefficients v m of cosine terms cos (m<jj) used 
to describe two-particle correlations on azimuth <f> and 
interpreted to represent flows, especially representing 
elliptic flow. In the flow context dijets play a compara¬ 
tively negligible role in final-state correlation structure. 

Modeling azimuth correlations by truncated Fourier 
series or individual cosine terms is not unique. Other 
model functions can describe the same data equally well 
and do suggest alternative physical interpretations, es¬ 
pecially substantial contributions from dijet production. 
In effect, two physics narratives compete to describe and 
interpret the same data. In one narrative collision dy¬ 
namics is dominated by dijet production. In the other 
narrative collision dynamics is dominated by a dense, 
flowing QCD medium. Opposing narratives appear to 
be supported by their respective data models. To break 
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the deadlock a method is required to evaluate model func¬ 
tions according to neutral criteria and identify a preferred 
model. 

In this study we introduce Bayesian Inference (BI) to 
evaluate competing model functions. BI analysis relies on 
a combination of the usual y 2 goodness-of-fit parameter 
and information I derived from the fit covariance matrix. 
I quantifies changes in the data model arising from acqui¬ 
sition of new data and represents an Occam penalty for 
excessive model complexity. Combination y 2 /2 + 1 leads 
to evidence parameter E that determines the plausibil¬ 
ity of each model when confronted with new data values. 
The goal is to rank data models according to BI criteria 
without resorting to a priori physics assumptions. 

We apply several representative model functions to an¬ 
gular correlation data and evaluate the model perfor¬ 
mance with BI methods. The data are published 2D 
angular correlations from three centrality classes of 200 
GeV Au-Au collisions on ( 77 , </>). 2D histograms are pro¬ 
jected onto periodic azimuth (f by integration over pseu¬ 
dorapidity 77 . The three collision centralities include the 
centrality extremes (most central and most peripheral) 
and an intermediate centrality that requires a separate 
azimuth-quadrupole model element in the data model. 

Model functions include (a) a “basic Model” consisting 
of a same-side (SS) peak modeled by a Gaussian at (j) = 0 
and an away-side (AS) peak at n modeled by a cylindrical 
dipole cos(</> — 7 r), (b) the basic Model plus one or more 
additional cosine terms and (c) several Fourier-series (FS- 
only) models consisting only of one or more cosine terms. 

For each model-data combination we obtain the best- 
fit y 2 and information / and combine them to form evi¬ 
dence E = exp[— (y 2 /2 + I)} interpreted in a BI context 
as the probability of data D* given model H. Informa¬ 
tion / is the logarithm of a volume ratio. The numerator 
is a “prior” volume on the space of model parameters 
determined consistently from model to model based on 
the nature of the parameters. The denominator is the 
volume on model parameters determined by the fit co- 
variance matrix. Thus, I measures information received 
by the model from new data and is interpreted in the BI 
context as an Occam penalty , with reference to Occam’s 
razor. With increasing model complexity (degrees of free¬ 
dom) y 2 typically decreases but / increases, leading to a 
maximum in evidence E for some model configuration. 

For each centrality we rank models according to evi¬ 
dence E which can vary over several orders of magnitude. 
The following systematics emerge: FS-only models (c) 
are rejected in all cases by at least a factor 100. The ba¬ 
sic Model (a) representing peaks at 0 and 7 r is preferred in 
all cases. A cylindrical quadrupole cos(2</>) is required to 
accompany the basic Model in some cases but is rejected 
for most-central Au-Au collisions. “Higher harmonics” 
cos(m</>) for in > 2 appended to the basic Model are re¬ 
jected in all cases. A model consisting of Gaussian + 
dipole cos(</>) + quadrupole cos( 2 </>) provides good data 
descriptions in all cases. Those results are generally con¬ 
sistent with a power spectrum analysis of data histograms 


in which signal and noise components are identified. 

A detailed study of the geometric structure of Bayesian 
analysis reveals that given comparable fit quality (y 2 ) for 
various data models the dominant factor in determining 
E is the ratios of data errors to parameter errors. Those 
ratios estimate elements of the Jacobian matrix charac¬ 
terizing the model function as a map from parameters 
to data. Smaller error ratios indicate smaller predicted 
volumes in the data space: the data model is more pre¬ 
dictive. Predictivity is then the determining factor in 
Bayesian model evaluation. FS-only models have no pre¬ 
dictivity, can describe any data configuration and are 
strongly rejected by Bayesian analysis. The basic Model 
is very predictive and is therefore strongly favored. 

We conclude from this study that QGP/flow narra¬ 
tives based on FS-only models or models with multiple 
cosine terms are disfavored because the requisite data 
models are rejected by Bayesian analysis. FS-only mod¬ 
els are not predictive, in particular cannot exclude dijets 
as a dominant collision mechanism. Dijet-based narra¬ 
tives are favored in that the basic Model, with peaks at 
0 and 7r that may represent dijet structure expected in 
such narratives, is strongly preferred by Bayesian analy¬ 
sis. This conclusion should of course be tested more gen¬ 
erally with other data and contexts such as unprojected 
2D angular-correlation histograms and their correspond¬ 
ing more-complex model parametrizations. 
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Appendix A: BI analysis geometry 

Section II indicates that BI analysis provides two im¬ 
portant results: (a) an improved posterior PDF on model 
parameters given newly-acquired data and (b) a quanti¬ 
tative method for comparing data models to identify the 
model function that achieves the best compromise be¬ 
tween accurate data description and minimum Occam 
penalty. In this appendix we examine the geometric 
structure of BI analysis on the joint parameter-data space 
to better understand how the BI method works. We find 
that evidence E(D*\H) is a measure of the predictivity of 
a model: BI analysis prefers the most predictive model 
that also describes the data with a satisfactory y 2 . 


1. Data space vs model-parameter space 

BI analysis is based on the relation between parameter 
space w and data space D. The data space is an N- 
dimensional space with axes D n . The model-parameter 
space is a K-dimensional space with axes Wk ■ Data model 
H is defined in part by model function F(D\wH) —> 
D(w) that relates a specific set of parameter values (point 
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w* in w) to a specific set of data values (point D* in D). 
Note that data D and data errors <jd are vectors with 
elements D n and a n . Similarly, parameters w and pa¬ 
rameter errors a w are vectors with elements Wk and a k - 
As set out in Sec. |IIC[ a joint parameter-data PDF 
p(wD\H ) = p(D\wH)p(w H) representing model H can 
be defined on the joint space w x D. If data values orig¬ 
inate as random samples from some parent distribution 
related to model H with specific parameter values w* 
the resulting data distribution may be described by a lo¬ 
calized conditional PDF p(D\w*H) on D with estimated 
means D n and standard deviations a n . Conversely, for 
a specific set of data values D* the resulting parameter 
distribution may be described by a localized conditional 
PDF p(w\D*H) on w with estimated means Wk and stan¬ 
dard deviations 


Whereas conditional PDFs p(D\w*H) and p[w\D*H) 
may be localized near their respective modes D and w, 
marginal PDFs p(w\H) and p(D\H) (dash-dotted lines 
in Fig. 19) may be nearly uniform over the local inter¬ 
vals relevant to the peaked functions. In what follows 
we extend the BI methodology to obtain a global geo¬ 
metric relation between parameter space and data space 
pursuant to model comparisons. We refer to conditional 
PDFs p(D\wH) and p(w\DH) as local and marginal 
PDFs p(w\H) and p(D\H) as global. 


2. Angle representation of model structure 


For each data model H the primary BI elements are 
model function D(w ), prior PDF p(w\H) (assuming a 
uniform prior on parameters w k ), some specific data D* 
and their uncertainties ap • A joint PDF p(wD\H) deter¬ 
mined by function D(w ) and errors ctd is then implicit. 
In a model fit to some specific data D* the data errors 
ojo and model function D(w) are combined to determine 
the most-probable model parameters ui and their uncer¬ 
tainties a w , or preferably a posterior PDF p(w\D*H) on 
space w (as in Sec. 11- 

Figure [19] provides a schematic of data-model corre¬ 
spondence, with model parameter Wk and data element 
D n in the local neighborhood of specific data values D*. 
The diagonal line represents the model function D(w). 
The data values D* n have estimated standard deviations 
cr n (data errors). The model function with data errors de¬ 
termines the gray band representing joint PDF p(wD\H). 

The likelihood function L(D*\wH) on Wk determines 
the most-probable parameter values Wk and their stan¬ 
dard deviations (Jk corresponding to data values D* and 
data errors ajo- As indicated by the bold vertical ar¬ 
row in Fig. [19] the likelihood function, with specific data 
errors, in effect probes the local algebraic structure of 
model function D(w ) near data D* by relating data er¬ 
rors a n to parameter errors ak- The geometric relation 
between data and parameters is characterized by angles 


D* 



FIG. 19: Schematic representation of the local relation in 
space w x D between data D and model parameters w with 
specific elements D n and Wk, especially the errors. The solid 
diagonal represents model function D(w). The hatched band 
arises from data errors, specifically cr„, corresponding then 
to parameter errors, specifically Uk- Angle dkn relating data 
and model errors is approximately a Jacobian element char¬ 
acterizing the algebraic structure of the data model. The 
dash-dotted lines represent parameter-prior and data PDFs. 


dkn defined by 

tan (9 kn ) = —- (Al) 

that relate data and parameter spaces. If the Hessian 
matrix for this application is diagonal (i.e. correlations 
among model parameters are small) those angles corre¬ 
spond to elements of the model-function Jacobian Jd(w) 

tan(0 fcn ) « VN ^ j d(w) , (A2) 


in the following sense: If the diagonal elements of the 
Hessian are approximated by 


Hkk 


1 



((W) 


(A3) 


the partial derivative in Eq. (A2) represents an r.m.s. 
quantity derived by averaging squared Jacobian elements 
over all data elements (weighted by the data errors). The 
same Jacobian structure may determine the relation be¬ 
tween global structures (PDFs) on w and on D by ex¬ 
trapolation, as discussed in the next subsection. 


3. Local and global volumes vs PDFs 

We can define effective volumes (generalized concept 
including lengths, areas, etc.) in spaces w and D in re¬ 
lation to the key PDFs associated with BI. By volume 
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we mean the result of integrating a unit-amplitude (at 
the mode) function over some bounded subspace includ¬ 
ing all points where the function is significantly nonzero. 
For example the “volume” of unit-amplitude ID Gaus¬ 
sian e~ x is \Z2ttg. Dividing a unit-amplitude func¬ 
tion by its volume results in a normalized PDF. 

The marginal PDF on data D is obtained by integrat¬ 
ing p(wD\H) over space w using the chain rule 


From Eq. (A5) the maximum likelihood L{D*\wH) is 


p{D*\wH) 


G(D*\w) 

V d {w\H) 


(A8) 


with G(D*\w) = exp(—x 2 /2). If D* falls outside Vd(H) 
evidence E(D*\H) = 0 (the data model is falsified). If 
not E(D*\H) ~ p(D\H) ~ 1 /Vo{H) and we then have 


p(D\H) 


j dw p(D\wH) p(w\H) (A4) 

VjH)S v , dWm ^- 


The (assumed uniform) prior PDF p(w\H) defines an ef¬ 
fective boundary surface for the integral over w, repre¬ 
sented schematically by the second line where each pa¬ 
rameter Wk is integrated over a prior in terv al and 
nti Afc = V W (H) = l/p[w\H) as in Eq. (10). 

For some values of D the integrand may be nonzero 
only outside the volume V u ,(H), in which case p{D\H) = 
0. If p(D\H) is nonzero and approximately uniform 
within limiting intervals A„ then p(D \H) ss 1/Vd(H) 
with Vd(H) = nli A n , and Eq. (A4) represents a rela¬ 
tion between the two global volumes V W (H) and Vd{H). 

The dual role of p(D\wH) as conditional PDF on D 
and as likelihood function on w is a central issue. With 
w* as a specific condition p(D\u>*H) is a unit-normal 
peaked distribution on D approximated by a Gaussian 


p{D\w*H) 


V d (w*\H) 


G(D K), 


(A5) 


where Vn(w*\H) = Y[n=\W^ a n}- As the likelihood 
function L(D*\wH) it is an unnormalized peaked dis¬ 
tribution on parameter space w which in the Laplace ap¬ 
proximation is proportional to Gaussian G(w\D*) with 
its integral 


J dw G(w\D*) 


K 

V w {D*\H)=\[{V^a k }, (A6) 

k=1 


where V W (D*\H) approximates yj(2n) K det Ck appear¬ 
ing in Eq. ([5|. We have thus defined four volumes, two 
each on w and D: two local and two global. 


4. Consequences for the data space — predictivity 

Using the Laplace approximation the BI evidence as 
defined in Eq. (5) can be written in terms of volumes as 


Vp(w\H) 

V d (H) 


G(D*\w) 


V W (D*\H) 


V W (H) 

ex P[—( x 2 /2 + /)], 


(A9) 


(with information / as defined in Sec. HE I relating the 


four volumes, where factor Vd(w\H) is a property of the 
data only, common to all models. 

We can relate tha t result to the model angles (Jaco¬ 
bian) from Sec. A 2 Assuming data errors a n are ap¬ 


proximately equal the local-volume ratio is factorized as 


V W (D*\H) 

V d (w\H) 


rifcLik 2 ^] 


(A10) 


V d (w\HY n ~ k )/ n 1 = 1 tan(0 fcn )’ 


K 

n 


where the first factor in the second line depends only on 
data common to all models with parameter number K, 
and the second factor is unique to a specific model. 


Rearranging Eq. (A9) (without Gaussian factor) as 


Vw(H) _ V W (D*\H) 
V d (H) ~ V d (w\H) 


(All) 


we note that the local-volume ratio on the right, obtained 
from the likelihood function and equivalent to the model- 
function Jacobian, estimates the global-volume ratio on 
the left by extrapolation. Vd(H) is then the data vol¬ 
ume predicted by a combination of prior PDF on model 
parameters and the model function. If the specific data 
values D* fall outside Vd{H) then p(D\H) = 0 and the 
model is falsified. The smaller the predicted data volume 
the larger the evidence and the more favored the model. 
Predictivity is then any essential feature of data models. 

The Occam penalty central to BI analysis represents 
not only excess parameter number and prior volume as a 
cost but also model predictivity as a benefit. Two mod¬ 
els with the same parameter number and priors may have 
very different plausibilities because of differences in their 
algebraic structure and therefore predictivity. A model 
with substantially greater predictivity may even be fa¬ 
vored over one with fewer parameters (Sec. VIII). 


E(D*\H) 


J dwL(D*\wH)p{w\H) (A7) 


Appendix B: Hessian matrix errors 


« L(D*\wH) 


V W (D*\H) 

V W (H) 


Since comparisons among data models in this BI study 
rely critically on fitted-parameter errors it is important 
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to establish the degree of uncertainty in the estimated 
errors. Parameter variances are determined by the cur¬ 
vatures (second derivatives) of the log-likehood function 
at the likelihood maximum. The Hessian is defined by 


Hjk — 


d 2 

dwjdwk 


log L(D*\wH) 


W=W 


(Bl) 


Appendix C: Laplace approximation 

Use of the Laplace approximation for the likelihood 
function in this study may be questioned due to possible 
inaccuracies. The Laplace approximation for the likeli¬ 
hood function \ogL(D*\iuH) = —g[w\ is 


For linear parameters such as the coefficients of a Fourier 
series the second derivatives are independent of the pa¬ 
rameter values w. Specifically, for 


logL(D*\wH)^-J2 7, 


N 1 (yn-Y,k w kfk{XnY 2 


n—1 


N 


log L(D*\wH) « — 


fj(x n )fk{x n ) 


d 2 

dwjdw k 

J n—1 u 

If the model functions are orthogonal we have 


N 


E fj( x n)fk(x n ) fj ( x n) 

-T2-► fa £ — 


(B2) 


• (B3) 


(B4) 


n—1 


n—1 


and the Hessian matrix is diagonal. In other words, for 
linear parameters such as the coefficients of a Fourier 
series the parameter variances depend only on the sam¬ 
ple points (positions) and the algebraic structure of the 
model function. For a Fourier series there is little flexi¬ 
bility - since there is no uncertainty in the model func¬ 
tion the only uncertainty in the Hessian arises from the 
uncertainty in the sample positions. Labeling the un¬ 
certainties in the x coordinate due to bin width Sx as 
a Xn = (Sx) 2 / 12, the uncertainty in the diagonal Hessian 
elements is 


N 


4, 


= £ 


2 fj(Xn)f'AXn) 


(Sx/ 

12 


(B5) 


There is only one nonlinear parameter, namely the width 
CT 0 of the SS Gaussian in the basic Model, but a similar 
formula should apply to that case as well. 

For bin 8 the diagonals of Hessian and errors for the 
FS-only model are 


Hjj = {8.19,8.22,8.23,8.23,8.23,8.23, (B 6 ) 

8.23,8.23,8.23,8.22,8.19} x 10 6 (B7) 

a Hjj = {3.46,6.91,10.4,13.8,17.3,0.00145, (B 8 ) 

24.2,27.7,31.1,34.6,38.} x 10 4 . (B9) 


The relative errors thus vary from 0.5 percent up to five 
percent. For the basic-Model fit in bin 8 we obtain 


w = {ai, <70, ad} (B10) 

H w = {0.478,1.55,8.19} x 10 6 (Bll) 

a Hw = {1.05,3.1,3.47} x 10 5 , (B12) 


implying relative errors from 5 percent to 22 percent. 
Thus the Hessian matrix elements (likelihood curva¬ 
tures), and therefore the error estimates for fitted model 
parameters, are determined to a few percent in this study. 




N 1 (yn-J2kfk( W ’X n )' 2 


£ 


jk ■ 

O n 


= g[u>] + (w — w) 

(w — u>) 3 d 3 g[u>] 


dg[w] (w — w) 2 d 2 g[w\ 


6 


dw 3 


dw 


+ ■ 


dw 2 


(Cl) 

(C2) 


where the first-derivative term in the Taylor series is zero 
by definition. The approximation then implies 



where we have carried the first correction term. For lin¬ 
ear parameters, including all of the parameters except 
< 70 , there is no fourth derivative as we have pointed 
out. The Laplace approximation is then exact (except 
for sub-exponential corrections caused by replacing the 
truncated Gaussian by a non-truncated version). For 
the single non-linear parameter in the basic Model the 
fourth derivative 1.09 x 10 6 must be divided by 1.55 x 10 6 
squared, implying that corrections are of the order 10~ 5 
- of the same order as the fit-model Hessian. The ef¬ 
fective number of data points is 10 5 and any corrections 
are of inverse order. Due to a large primary-data volume 
[more than a million Au-Au collisions with (on average) 
hundreds of particles per collision] the Laplace method 
as applied in the present study is very accurate. 


Appendix D: Periodic peak arrays 

Because azimuth 0 is a periodic variable any ID struc¬ 
ture on <pA can be described by a discrete Fourier cosine 
series FS. But representing an arbitrary ID projection 
by a few terms of a ID Fourier series can be mislead¬ 
ing. We should acknowledge the possibility that specific 
peak structures may be part of the azimuth distribution. 
In this appendix we consider the FS representation of a 
periodic Gaussian peak array on ID azimuth. 

The peaks observed at 0 a = 0 (SS, same-side) and 
0A = 7T (AS, away-side) in all ID azimuth histograms 
from high-energy nuclear collisions are actually elements 
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of separate periodic peak arrays described by cosine se¬ 
ries. The SS array is centered on even multiples of 7 r, 
the AS array on odd multiples. Nearest array elements 
outside a 27r interval (image peaks) produce significant 
structure within the observed interval and must be in¬ 
cluded in fit models to insure valid data descriptions. 

Each peak array (SS or AS) may be represented by a 
FS of the form 


JV/2 

£'(</>A;CT0 a ,7'i)= ^2 F m ,n COs(m, [<^A - 717 t]),(D1) 

m—-N/ 2 

where the F mjn are functions of r.m.s. peak width a ( j >A 
defined below. Since n is even for SS peak arrays (+) and 
odd for AS arrays (—) odd multipoles must be explicitly 
labeled as SS or AS. The terms represent 2 to poles, e.g. 
dipole (to = 1), quadrupole (to = 2), sextupole (to = 3) 
and octupole (to = 4), referring to cylindrical multipoles. 



FIG. 20: Left: Periodic arrays of SS (dash-dotted) and AS 
(dashed) peaks. The SS peaks are Gaussians. The AS peaks 
are described by a dipole. The dotted sinusoid corresponds 
to the m = 2 Fourier component of the SS peaks. Right: 
Evaluation of Eq. (D2 I for four values of to, with a^, A = 0.65. 


The Fourier amplitudes F m of a unit-amplitude Gaus¬ 
sian peak array are defined (for to ^ 0) as functions of 
the r.m.s. peak width by 


2F m (a <j>A ) = ^/2j^a^ A exp(-m 2 al A /2) . (D2) 


As peak width a^ A increases, the width on index m de¬ 
creases and the number of significant terms in the series 
Eq. (Dl) decreases. The limiting case is o^ A ~ 7r/2, for 
which the peak array is approximated by a constant plus 
dipole term. For narrower (SS) peaks Fourier terms with 
to > 1 become significant, and a Gaussian function is the 
more efficient peak model, as demonstrated in this study. 


Fig. [20] (left panel) shows peak arrays (solid points) for 
SS andAS peaks extending beyond one 27 t period. The 
SS Gaussian peak array with <r^ A = 0.65 (typical value 
for all but peripheral A-A collisions) is the dash-dotted 
curve, the AS array with a ( j >A ~ 7r/2 is the dashed curve 
(approximately dipole in this case). The dotted curve 
represents the quadrupole term of the SS peak array. 


Figure 20 (right panel) shows Eq. (D2) for cr^ A = 0.65 


with the first few multipole coefficients marked for ref¬ 
erence (open circles). For that width the jet-related 
quadrupole amplitude is 2 F 2 ~ 0.22. If the SS peak is not 
separately described by a Gaussian peak model F 2 rep¬ 
resents the dominant jet-related nonflow contribution to 
^ 2 {2} ~ v%{EP} data in the form pov%- Similarly, other 
Fourier components of the SS jet peak and the dipole 
component of the AS peak could be misidentified as flow 
components, including “higher harmonic” flows [52] . 
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